Percy Liang

AI Research People Universities

21 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

35 citations

Revision

v3 · 4,196 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Percy Liang is an associate professor of computer science at Stanford University and the founding director of the Stanford Center for Research on Foundation Models (CRFM), the lab that coined the term "foundation models" in 2021, built the HELM holistic evaluation benchmark, and produced the Foundation Model Transparency Index. ^[1]^[2] He is also a co-founder of the open-model cloud company Together AI, valued at $3.3 billion in 2025. ^[8]^[32] Liang is one of the most influential academic voices on rigorous AI measurement and open, reproducible model development, with widely cited contributions to natural language processing, machine learning, semantic parsing, and the empirical study of large-scale foundation models. ^[3]

Liang led the team that produced the HELM benchmark (Holistic Evaluation of Language Models), released in November 2022 as one of the first comprehensive, multi-metric, multi-scenario evaluations of contemporary language models, benchmarking 30 prominent models across 42 scenarios; he has continued to expand the framework through successor releases that target safety, multimodality, and frontier capabilities. ^[4]^[5] He also co-authored the influential 2021 report "On the Opportunities and Risks of Foundation Models", a 200-plus-page collaborative paper that introduced the term "foundation model" to the research literature and helped frame discussion of the new generation of broadly applicable AI systems. ^[6]^[7]

In June 2022, Liang co-founded the AI infrastructure company Together AI alongside Vipul Ved Prakash, Ce Zhang, and Christopher Re, where he serves as a founder. ^[8]^[9] He has received the Presidential Early Career Award for Scientists and Engineers (PECASE), the IJCAI Computers and Thought Award, an NSF CAREER Award, a Sloan Research Fellowship, and a Microsoft Research Faculty Fellowship. ^[10]^[11]^[12]

Key facts


Full name	Percy Shuo Liang ^[13]
Field	Computer science, natural language processing, machine learning
Institution	Stanford University
Position	Associate Professor of Computer Science (with courtesy appointment in Statistics); Director, Center for Research on Foundation Models (CRFM) ^[1]^[13]
Education	B.S., MIT (2004); M.Eng., MIT (2005); Ph.D., UC Berkeley (2011) ^[3]
Doctoral advisors	Michael I. Jordan and Dan Klein ^[3]
Notable projects	HELM, CRFM, Foundation Models report, Foundation Model Transparency Index, SQuAD (co-author), Alpaca (co-author), CodaLab Worksheets, Marin ^[4]^[6]^[14]^[15]^[16]^[33]
Company	Co-founder of Together AI (2022) ^[8]^[9]
Major awards	PECASE (2019); IJCAI Computers and Thought Award (2016); NSF CAREER (2016); Sloan Research Fellowship (2015); Microsoft Research Faculty Fellowship (2014) ^[10]^[11]

Who is Percy Liang?

Percy Liang is an American computer scientist and an associate professor of computer science at Stanford University, where he holds a courtesy appointment in the Department of Statistics. He is the founding director of the Stanford Center for Research on Foundation Models (CRFM), launched in 2021 as part of the Stanford Institute for Human-Centered Artificial Intelligence (HAI). ^[1]^[2] Liang is widely recognized for his contributions to natural language processing, machine learning, and the empirical study of large-scale foundation models, including semantic parsing, robustness, weakly supervised learning, and rigorous evaluation. ^[3]

Early life and education

Specific details of Liang's birth date, place of birth, and family background are not documented in widely available reputable sources, and as a result this article omits speculative biographical claims. As a high school student, Liang represented the United States at the International Olympiad in Informatics (IOI), where he earned bronze and silver medals in successive years. The IOI is an annual computer-programming competition for secondary-school students, and medalling at the international final is considered a strong predictor of subsequent research success in algorithms and theoretical computer science. ^[3]

Liang attended the MIT for his undergraduate and master's studies, earning a Bachelor of Science in 2004 and a Master of Engineering in 2005, both in electrical engineering and computer science. His master's thesis at MIT was advised by the statistical NLP researcher Michael Collins (later of Columbia University and Google), who was then at MIT. The MIT NLP environment during Liang's time was a particularly active center for probabilistic and structured-prediction approaches to language understanding, and Liang's exposure to these methods would shape his subsequent doctoral program. ^[11]^[3]

He went on to the UC Berkeley for doctoral study in computer science, where he was jointly advised by Michael I. Jordan (statistical machine learning) and Dan Klein (natural language processing). Both Jordan and Klein are themselves widely recognized figures in their respective fields, and the joint advising arrangement gave Liang training that explicitly bridged statistical theory and applied NLP. He completed his Ph.D. in 2011 with a dissertation titled "Learning Dependency-Based Compositional Semantics", which developed a new semantic formalism (DCS) for learning semantic parsers from question-answer pairs rather than from expensive annotated logical forms. ^[17]^[18] Closely related work appeared at the 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011) in Portland, Oregon, and was subsequently published in the journal Computational Linguistics. ^[18] After receiving his Ph.D., Liang held a short postdoctoral position at Google Research before joining the Stanford faculty. ^[3]

Stanford faculty and research program

Liang joined the Stanford Computer Science Department as an assistant professor in fall 2012. ^[11] In 2019 he was promoted to associate professor, the same year he received the PECASE. ^[12] He holds a courtesy appointment in the Department of Statistics. ^[1]

Liang's research has spanned several interrelated themes in machine learning and natural language processing, including: (1) semantic parsing and question answering; (2) weak and indirect supervision; (3) robustness, generalization, and uncertainty quantification; (4) the empirical study of foundation models and their evaluation; and (5) tools and infrastructure for reproducible research. ^[3]^[11] At Stanford he has supervised a large group of graduate students who have produced influential work in NLP, including contributions to widely used datasets, benchmarks, and frameworks.

Liang teaches several flagship courses at Stanford, including CS221 (Artificial Intelligence: Principles and Techniques), the statistical learning theory course CS229T/STATS231, CS324 (Advances in Foundation Models), and CS336 (Language Models from Scratch). ^[1]

Major research themes

Semantic parsing and question answering

Liang's doctoral work and much of his early Stanford research focused on semantic parsing: mapping natural language utterances to logical forms that can be executed against a knowledge base or database to produce an answer. The motivating problem is conceptually old, often associated with question-answering systems of the 1960s and 1970s, but Liang's contribution was to recast it as a statistical learning problem trainable from indirect supervision. His paper "Semantic Parsing on Freebase from Question-Answer Pairs" (Berant, Chou, Frostig, and Liang, EMNLP 2013) established a popular paradigm in which the logical form is treated as a latent variable trained from question-answer pairs alone, sidestepping the need for expensive logical-form annotation. The accompanying WebQuestions dataset became a widely used benchmark for open-domain factoid question answering against the Freebase knowledge graph. ^[19] Liang summarized this body of work in the 2016 Communications of the ACM article "Learning Executable Semantic Parsers for Natural Language Understanding", which has served as a frequently cited introduction to the area. ^[20] He also developed and released SEMPRE, an open-source toolkit for training semantic parsers that map natural language utterances to denotations via intermediate logical forms; SEMPRE supports several formalisms, including lambda calculus, lambda DCS, and Java expressions, and is agnostic to whether logical-form construction proceeds via combinatory categorial grammar or simpler chart-based approaches. ^[21]

Liang's work on question answering culminated in the co-authorship of the SQuAD dataset. With his student Pranav Rajpurkar and collaborators Jian Zhang and Konstantin Lopyrev, he introduced the Stanford Question Answering Dataset at EMNLP 2016: a collection of more than 100,000 crowd-sourced reading-comprehension questions over Wikipedia passages, paired with span-based answers. The paper analysed the kinds of reasoning required to answer SQuAD questions using dependency and constituency parses, reported a strong logistic-regression baseline achieving an F1 of 51.0 (compared with a simple baseline at roughly 20.0 and human performance at 86.8), and made the dataset and an associated leaderboard publicly available. ^[14] SQuAD became one of the most influential benchmarks of the pre-foundation-model era and was used to train and evaluate a long succession of architectures, including bi-directional attention-flow networks, the QA-specific variants of ELMo, BERT, and its successors. A follow-on version, SQuAD 2.0 (Rajpurkar, Jia, and Liang, ACL 2018), added unanswerable questions to make the task significantly harder.

Robustness, generalization, and reliable machine learning

Liang has been a leading voice on the question of when machine-learning models actually generalize, and what their failure modes reveal about their internal representations. The 2017 paper "Adversarial Examples for Evaluating Reading Comprehension Systems" with his student Robin Jia showed that state-of-the-art SQuAD models could be fooled by appending semantically irrelevant distractor sentences to passages: the inserted sentence resembled the question syntactically but did not change the correct answer, yet caused systems to flip to the distractor's content. The paper exposed brittle pattern-matching behavior in ostensibly strong reading-comprehension systems and was awarded the Best Long Paper Award at EMNLP 2017, helping catalyze a broader literature on adversarial NLP. ^[22]

Subsequent work in Liang's group developed methods for certified robustness to adversarial word substitutions using interval-bound propagation through deep models, distributionally robust optimization for protecting performance on underrepresented subpopulations, and analyses of how training data and pretraining objectives shape downstream model behavior. ^[23] More broadly, Liang's group has contributed to the study of calibration in neural networks, the analysis of memorization versus generalization in large models, the construction of test suites designed to detect spurious correlations, and the use of conformal prediction and selective classification for uncertainty quantification. This body of work has been highly influential in shaping the discussion of "reliable" machine learning that accompanied the deployment of language models into safety-sensitive applications.

Foundation models and open development

Beginning around 2020, Liang's research has increasingly focused on the empirical and methodological study of large-scale pretrained models, the class of systems that he and his collaborators would soon dub "foundation models". His group at Stanford has contributed to work on instruction tuning, in-context learning, retrieval augmentation, calibration, faithfulness of generated text, and the scaling behavior of large language models, as well as on transparent benchmarking and reporting. A representative example is Stanford Alpaca, an instruction-following model released by CRFM in March 2023 that fine-tuned Meta's LLaMA 7B on 52,000 instruction-following demonstrations generated in the style of self-instruct; the team reported that Alpaca behaved qualitatively similarly to OpenAI's text-davinci-003 while costing less than $600 to reproduce, and Liang is listed among its co-authors. ^[33] His group has also studied the effects of pretraining data composition on downstream behavior, and Liang has collaborated on foundational papers in retrieval-augmented language modeling and on the DSPy line of programmatic LM composition work led by his Stanford colleagues. ^[30]^[31]

Liang has been a vocal advocate for open and reproducible AI research, arguing that "open weight" releases such as Llama or Gemma are not enough if the training data, training code, and developmental decisions are kept proprietary. To put that view into practice he leads Marin, an open-development laboratory for building foundation models in which experiments, including failed ones, are pre-registered on GitHub and the full training pipeline (code, data, weights, and logs) is released publicly. Marin's announcement post characterized this approach as "a radically new way of doing model development, inspired by true open-source software, where every experiment is done in the open and anyone can suggest ideas, review, and even run experiments through GitHub". ^[15] Stanford and collaborators announced Marin's initial 8-billion-parameter model in May 2025, accompanied by a detailed retrospective ("Marin 8B retrospective") that documented the training process in the spirit of Meta's earlier OPT logbook, and the project has subsequently released larger 32-billion-parameter models trained from scratch. ^[15]^[24]

Evaluation and benchmarks

Throughout his career Liang has argued that progress in AI requires careful, standardized measurement, and that ad hoc reporting practices have systematically distorted the public understanding of what models can and cannot do. This conviction motivated CodaLab Worksheets, an open-source platform for managing reproducible computational experiments, which Liang began in 2013 with support from Microsoft Research and Evelyne Viegas. CodaLab Worksheets allows researchers to capture the full provenance of an experiment, from raw data through preprocessing and model training to final results, and to share that provenance with collaborators. The platform has been adopted as a submission target for software resources at several Association for Computational Linguistics conferences, including ACL 2016. ^[16] More recently, the same conviction motivates the HELM benchmark and its successors at CRFM, the Foundation Model Transparency Index project, and the Ecosystem Graphs initiative tracking which foundation models depend on which datasets, assets, and providers.

CRFM (2021-present)

In August 2021, Stanford's HAI launched the Center for Research on Foundation Models with Liang as its founding director. CRFM was framed as an interdisciplinary effort spanning more than ten Stanford departments and dedicated to making "fundamental advances in the study, development, and deployment of foundation models". ^[2]^[25] Its mission areas, as articulated on the center's website, include technical research on data, systems, architecture, training, adaptation, inference, interpretability, and evaluation of foundation models; specialized applications in domains such as law, music, robotics, and biomedicine; analysis of societal considerations such as transparency, supply chains, openness, copyright, privacy, and systemic risks; and engagement with policymakers on evidence-based AI policy across multiple jurisdictions. ^[2]

Who coined the term "foundation model"?

CRFM's launch in 2021 was accompanied by the release of "On the Opportunities and Risks of Foundation Models", a 200-plus-page collaborative report authored by Rishi Bommasani, Liang, and more than 100 additional researchers at Stanford and other institutions. The report introduced the term "foundation model" as a label for "any model that is trained on broad data... and can be adapted (e.g., fine-tuned) to a wide range of downstream tasks", and provided a comprehensive survey of their capabilities (language, vision, robotics, reasoning, human interaction), technical underpinnings (model architectures, training procedures, data, systems, security, evaluation, theory), applications (law, healthcare, education), and societal implications (inequity, misuse, economic and environmental impact, legal and ethical considerations). Liang initiated and conceptualized the overall framing of the report and, together with Bommasani, led the decentralized writing effort while providing guidance on individual sections. ^[6]^[7] The report has been widely cited in academic and policy contexts. Its central terminological proposal, that the new class of broadly applicable pretrained models be called "foundation models", was deliberately chosen to avoid the narrower implications of "pretrained model" or the more loaded "large language model"; the term has since entered standard use across the AI research community and in regulatory documents in the United States and Europe.

Under Liang's leadership, CRFM has produced a steady stream of research outputs, ranging from open-weight model releases and infrastructure projects (such as Levanter, the JAX-based training framework, and Mistral, an earlier CRFM training framework not to be confused with the company of the same name) to influential reports on the foundation-model ecosystem, including the Foundation Model Transparency Index (FMTI) and the Ecosystem Graphs project that catalogues dependencies among models, datasets, and providers. ^[2] CRFM has also organized workshops, working groups, and conferences such as the Stanford Foundation Models workshops, bringing together academic researchers, industrial labs, and policy actors.

HELM benchmark

In November 2022, CRFM released "Holistic Evaluation of Language Models" (HELM), a benchmark and accompanying paper authored by Liang, Bommasani, Tony Lee, and many co-authors. ^[4]^[5] The paper argued that prior evaluation practice was fragmented and incomplete: different models were typically reported on different subsets of tasks, and important properties such as calibration, robustness, fairness, and toxicity were rarely measured at all.

HELM's central contribution was a taxonomy of scenarios (use cases such as question answering, summarization, sentiment analysis, and so on) paired with seven metrics evaluated when applicable: accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency. The first release benchmarked 30 prominent language models, spanning open, limited-access, and fully closed systems, across 42 scenarios. According to the authors, prior to HELM models were on average evaluated on just 17.9 percent of the core HELM scenarios, with many pairs of widely cited systems having no scenario in common; HELM raised that coverage to 96.0 percent under uniform conditions. ^[4]

HELM was designed as a "living benchmark" that would be continuously updated as new models, scenarios, and metrics emerged. Subsequent releases have introduced or extended specialized variants for particular evaluation concerns, including HELM Lite (a smaller, frequently updated leaderboard), HELM Capabilities, HELM Instruct, HELM Safety, HELM MMLU, HELM AIR-Bench, and multimodal extensions such as VHELM and HEIM (for image generation). The HELM source code is maintained as an open-source framework on GitHub. ^[5]^[26]

HELM has been widely adopted by researchers, model developers, and policymakers as a reference point for comparing model behavior. Liang has frequently cited the project in talks and articles as an example of how transparent, holistic measurement can shape both research and the public conversation around frontier AI. ^[25]^[27]

How transparent are foundation models? The Foundation Model Transparency Index

In October 2023, CRFM published the first Foundation Model Transparency Index (FMTI), a study led by Rishi Bommasani and Kevin Klyman with Liang as senior author and CRFM director. The FMTI defined 100 fine-grained transparency indicators spanning the resources used to build a model (such as data, labor, and compute), the model itself, and its downstream use, then scored 10 major foundation model developers against those indicators. ^[34] The developers assessed were OpenAI, Anthropic, Google, Meta, Amazon, Inflection, AI21 Labs, Cohere, Hugging Face, and Stability AI.

The results documented widespread opacity across the ecosystem: the average score was just 37 out of 100, the top scorer (Meta's Llama 2) reached 54, and the lowest scorer reached 12. ^[34]^[35] The index found that open-weight developers tended to score higher, with the gap driven largely by upstream indicators such as disclosure of training data, the labor used to build the model, and the compute consumed. Liang and his co-authors argued that this lack of transparency makes it harder for businesses, researchers, policymakers, and the public to understand the limitations and risks of the systems on which society increasingly depends. CRFM published an updated 2024 edition (and later a 2025 edition) that expanded the developer set and tracked changes over time. ^[35]

Together AI co-founding

In June 2022, Liang co-founded Together AI, a company building a cloud platform for training, fine-tuning, and running open-source foundation models. The other co-founders were Vipul Ved Prakash, formerly chief technology officer of Twitter following its acquisition of Topsy; the systems-and-ML researcher Ce Zhang, then at ETH Zurich; and Stanford computer scientist Christopher Re. ^[8]^[9] Together AI's stated goal has been to make open-weight model training and inference broadly accessible at substantially lower cost than the dominant hyperscale cloud vendors.

Together AI announced a $20 million seed financing round led by Lux Capital in May 2023, followed by an additional $102.5 million Series A round later that year. ^[28]^[9] In February 2025, the company raised a $305 million Series B led by General Catalyst and co-led by Prosperity7, with participation from NVIDIA, Salesforce Ventures, Kleiner Perkins, Lux Capital, and others, valuing Together AI at $3.3 billion. ^[32] Researcher Tri Dao, known for the FlashAttention work and other systems-level ML contributions, subsequently joined as chief scientist; the company lists Prakash (CEO), Zhang (CTO), Re, Dao, and Liang as founders. ^[29] Liang's role is that of a founder, while continuing in his primary capacity as a Stanford faculty member.

The company has released or partnered on a number of widely used open-source models, including the RedPajama datasets and language models (a community reproduction of the LLaMA training data and model series), as well as serving and fine-tuning infrastructure used by a wide range of customers. ^[9]

Awards and honors

Liang has received numerous recognitions for his research, including: ^[11]^[12]^[10]

Microsoft Research Faculty Fellowship (2014)
Sloan Research Fellowship (2015)
NSF CAREER Award (2016)
IJCAI Computers and Thought Award (2016), recognizing his contributions to semantic parsing and to methods for learning latent-variable models, sometimes with weak supervision
Presidential Early Career Award for Scientists and Engineers (PECASE, 2019), nominated by the Department of Defense and the National Science Foundation

He has also been recognized through Best Paper awards at leading NLP and machine-learning venues. The 2017 EMNLP paper "Adversarial Examples for Evaluating Reading Comprehension Systems" (Jia and Liang) received the Best Long Paper Award. ^[22] Other works from his group have received outstanding-paper or best-paper recognition at conferences including ACL, EMNLP, and NeurIPS.

Selected publications

The following list highlights a selection of Liang's most widely cited and influential publications. Many are co-authored with students and collaborators.

Liang, P. (2011). Learning Dependency-Based Compositional Semantics. Ph.D. dissertation, UC Berkeley. ^[17]
Liang, P., Jordan, M. I., and Klein, D. (2011). "Learning Dependency-Based Compositional Semantics". Proceedings of ACL 2011. (Later expanded in Computational Linguistics, 2013.) ^[18]
Berant, J., Chou, A., Frostig, R., and Liang, P. (2013). "Semantic Parsing on Freebase from Question-Answer Pairs". EMNLP 2013. ^[19]
Liang, P. (2016). "Learning Executable Semantic Parsers for Natural Language Understanding". Communications of the ACM 59(9). ^[20]
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). "SQuAD: 100,000+ Questions for Machine Comprehension of Text". EMNLP 2016. ^[14]
Jia, R., and Liang, P. (2017). "Adversarial Examples for Evaluating Reading Comprehension Systems". EMNLP 2017. Best Long Paper. ^[22]
Bommasani, R., Hudson, D. A., Adeli, E., ... Liang, P., et al. (2021). "On the Opportunities and Risks of Foundation Models". arXiv:2108.07258. ^[7]
Liang, P., Bommasani, R., Lee, T., et al. (2022). "Holistic Evaluation of Language Models". arXiv:2211.09110. ^[4]
Bommasani, R., Klyman, K., ... Liang, P., et al. (2023). "The Foundation Model Transparency Index". arXiv:2310.12941. ^[34]
Khattab, O., Santhanam, K., Li, X. L., Hall, D., Liang, P., Potts, C., and Zaharia, M. (2022). "Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive NLP". arXiv:2212.14024. ^[30]
Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Moazam, H., Miller, H., Zaharia, M., and Potts, C., with contributions from Liang, P., et al. (2023). "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines". arXiv:2310.03714. ^[31]

References

Stanford Computer Science, "Percy Liang". https://cs.stanford.edu/~pliang/ ↩
Stanford Center for Research on Foundation Models, "About CRFM". https://crfm.stanford.edu/ ↩
Wikipedia, "Percy Liang". https://en.wikipedia.org/wiki/Percy_Liang ↩
Liang, P., Bommasani, R., Lee, T., et al., "Holistic Evaluation of Language Models", arXiv:2211.09110 (November 2022). https://arxiv.org/abs/2211.09110 ↩
Stanford CRFM, "Holistic Evaluation of Language Models (HELM)". https://crfm.stanford.edu/helm/ ↩
Stanford Center for Research on Foundation Models, "On the Opportunities and Risks of Foundation Models" report page. https://crfm.stanford.edu/report.html ↩
Bommasani, R., et al., "On the Opportunities and Risks of Foundation Models", arXiv:2108.07258 (August 2021). https://arxiv.org/abs/2108.07258 ↩
Together AI, "About Us". https://www.together.ai/about-us ↩
TechCrunch, "Together raises $20M to build open source generative AI models" (15 May 2023). https://techcrunch.com/2023/05/15/together-raises-20m-to-build-open-source-generative-ai-models/ ↩
The White House (archived), "President Donald J. Trump Announces Recipients of the Presidential Early Career Award for Scientists and Engineers". https://trumpwhitehouse.archives.gov/briefings-statements/president-donald-j-trump-announces-recipients-presidential-early-career-award-scientists-engineers/ ↩
Stanford Profiles, "Percy Liang". https://profiles.stanford.edu/percy-liang ↩
Stanford Daily, "Twelve Stanford researchers receive Presidential Early Career Award for Scientists and Engineers" (26 July 2019). https://stanforddaily.com/2019/07/26/twelve-stanford-researchers-receive-presidential-early-career-award-for-scientists-and-engineers/ ↩
Stanford Department of Statistics, "Percy Shuo Liang". https://statistics.stanford.edu/people/percy-shuo-liang ↩
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P., "SQuAD: 100,000+ Questions for Machine Comprehension of Text", EMNLP 2016. https://arxiv.org/abs/1606.05250 ↩
Marin, "Introducing Marin: An Open Lab for Building Foundation Models" (19 May 2025). http://marin.community/blog/2025/05/19/announcement/ ↩
CodaLab Worksheets Documentation, "About". https://codalab-worksheets.readthedocs.io/en/latest/About/ ↩
Liang, P., "Learning Dependency-Based Compositional Semantics", Ph.D. dissertation, UC Berkeley (August 2011). https://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-90.html ↩
Liang, P., Jordan, M. I., and Klein, D., "Learning Dependency-Based Compositional Semantics", ACL 2011 / Computational Linguistics. https://aclanthology.org/P11-1060/ ↩
Berant, J., Chou, A., Frostig, R., and Liang, P., "Semantic Parsing on Freebase from Question-Answer Pairs", EMNLP 2013. https://aclanthology.org/D13-1160/ ↩
Liang, P., "Learning Executable Semantic Parsers for Natural Language Understanding", arXiv:1603.06677. https://arxiv.org/abs/1603.06677 ↩
Stanford NLP Group, "SEMPRE: Semantic Parsing with Execution". https://nlp.stanford.edu/software/sempre/ ↩
Jia, R., and Liang, P., "Adversarial Examples for Evaluating Reading Comprehension Systems", arXiv:1707.07328 (EMNLP 2017). https://arxiv.org/abs/1707.07328 ↩
Jia, R., Raghunathan, A., Goksel, K., and Liang, P., "Certified Robustness to Adversarial Word Substitutions", arXiv:1909.00986. https://arxiv.org/abs/1909.00986 ↩
PyTorch Conference 2025, "Keynote: Marin: An Open Lab for Frontier AI - Percy Liang". https://pytorchconference.sched.com/event/27SII/ ↩
Stanford HAI, "Introducing the Center for Research on Foundation Models (CRFM)". https://hai.stanford.edu/news/introducing-center-research-foundation-models-crfm ↩
Stanford CRFM, "HELM" GitHub repository. https://github.com/stanford-crfm/helm ↩
Stanford HAI, "Percy Liang on the Center for Research on Foundation Models: The First and Next 30 Years". https://hai.stanford.edu/news/percy-liang-center-research-foundation-models-first-and-next-30-years ↩
SiliconANGLE, "Another generative AI startup, Together AI, secures $100M+ in funding" (29 November 2023). https://siliconangle.com/2023/11/29/another-generative-ai-startup-together-ai-secures-millions-series-funding/ ↩
Together AI, "About Us" (leadership team listing). https://www.together.ai/about-us ↩
Khattab, O., Santhanam, K., Li, X. L., Hall, D., Liang, P., Potts, C., and Zaharia, M., "Demonstrate-Search-Predict", arXiv:2212.14024. https://arxiv.org/abs/2212.14024 ↩
Khattab, O., et al., "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines", arXiv:2310.03714. https://arxiv.org/abs/2310.03714 ↩
Together AI, "Together AI Announces $305M Series B to Scale AI Acceleration Cloud for Open Source and Enterprise AI" (20 February 2025). https://www.together.ai/blog/together-ai-announcing-305m-series-b ↩
Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., Liang, P., and Hashimoto, T. B., "Stanford Alpaca: An Instruction-following LLaMA Model", Stanford CRFM (13 March 2023). https://crfm.stanford.edu/2023/03/13/alpaca.html ↩
Bommasani, R., Klyman, K., et al., "The Foundation Model Transparency Index", arXiv:2310.12941 (October 2023). https://arxiv.org/abs/2310.12941 ↩
Stanford HAI, "Introducing The Foundation Model Transparency Index". https://hai.stanford.edu/news/introducing-foundation-model-transparency-index ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

Alpaca (model)BountyBench Christopher Ré Cybench DoReMi HELM (Holistic Evaluation of Language Models)Influence functions (machine learning)Jacob Steinhardt Jason Wei SQuAD Sophia (optimizer)Stanford Institute for Human-Centered Artificial Intelligence

Key facts

Who is Percy Liang?

Early life and education

Stanford faculty and research program

Major research themes

Semantic parsing and question answering

Robustness, generalization, and reliable machine learning

Foundation models and open development

Evaluation and benchmarks

CRFM (2021-present)

Who coined the term "foundation model"?

HELM benchmark

How transparent are foundation models? The Foundation Model Transparency Index

Together AI co-founding

Awards and honors

Selected publications

References

Improve this article

Related Articles

Noam Shazeer

Tri Dao

Yejin Choi

Richard S. Sutton

Oriol Vinyals

Quoc V. Le

What links here

Related Articles

Noam Shazeer

Tri Dao

Yejin Choi

Richard S. Sutton

Oriol Vinyals

Quoc V. Le

What links here