# Jacob Steinhardt

> Source: https://aiwiki.ai/wiki/jacob_steinhardt
> Updated: 2026-06-08
> Categories: People
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

Jacob Steinhardt is an American statistician and computer scientist who is an associate professor of statistics, with a joint appointment in electrical engineering and computer sciences (EECS), at the [University of California, Berkeley](/wiki/uc_berkeley), and the co-founder and chief executive of [Transluce](/wiki/transluce), a nonprofit research lab building open tools for understanding and overseeing advanced AI systems. He is known for foundational work in [AI safety](/wiki/ai_safety) and the robustness of [machine learning](/wiki/machine_learning) systems, for helping create the widely used [MMLU](/wiki/mmlu) and MATH benchmarks, for research on reward hacking and scalable oversight, and for empirical studies of AI forecasting. [1][2]

Steinhardt runs an influential research group at Berkeley whose alumni include several prominent figures in AI safety, among them [Dan Hendrycks](/wiki/dan_hendrycks), director of the [Center for AI Safety](/wiki/center_for_ai_safety), and Collin Burns, a researcher at [OpenAI](/wiki/openai). He also writes the blog "Bounded Regret," known for its analysis of how quickly AI capabilities advance. [1][10]

## Education

Steinhardt earned a Bachelor of Science in mathematics from the [Massachusetts Institute of Technology](/wiki/massachusetts_institute_of_technology) in 2012, the same year he was named a Hertz Fellow. He completed a PhD in computer science at [Stanford University](/wiki/stanford_university) in 2018, advised by [Percy Liang](/wiki/percy_liang). His dissertation, "Learning from Untrusted Data," developed algorithms for statistical estimation that stay reliable when a fraction of the input is arbitrarily corrupted, a theme of robustness that runs through much of his later work. [3][9]

Around his doctorate he became closely involved with the emerging field of AI safety. He conducted research at OpenAI and served as a technical advisor to [Open Philanthropy](/wiki/open_philanthropy). In the fall of 2019 he joined the UC Berkeley Department of Statistics as an assistant professor and was later promoted to associate professor, holding a joint appointment in EECS and affiliating with the Berkeley Artificial Intelligence Research (BAIR) lab. [1][3]

## Research contributions

Steinhardt describes his research as aimed at ensuring that machine learning systems are understood by, and aligned with, the people who use them. His group has worked across robustness, reward specification, [interpretability](/wiki/interpretability), and the empirical science of how AI systems change as they scale. [1]

### Robustness and distribution shift

Early in his career Steinhardt worked on the theory and practice of robust machine learning. With Moses Charikar and Gregory Valiant he published "Learning from Untrusted Data" at STOC 2017, giving algorithms that recover accurate estimates even when most of the data is adversarially chosen. He extended these ideas to security in "Certified Defenses for Data Poisoning Attacks" (NeurIPS 2017), which provides provable bounds on how much an attacker who inserts malicious training examples can degrade a model. His group later studied how image classifiers degrade under natural [distribution shift](/wiki/distribution_shift), the gap between curated benchmarks and the messier data a model meets in deployment. [9]

### Reward hacking and specification

A recurring concern in Steinhardt's work is reward hacking, in which a system optimizes a proxy objective in ways that diverge from what its designers intended. In 2016 he was one of six co-authors, led by [Dario Amodei](/wiki/dario_amodei), of "Concrete Problems in AI Safety," an influential agenda that framed safe exploration, robustness to distributional shift, scalable oversight, and avoiding reward hacking as concrete engineering challenges rather than distant speculation. With his students Alexander Pan and Kush Bhatia he published "The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models" (ICLR 2022), which showed empirically that more capable agents often exploit a misspecified reward more aggressively, sometimes through sharp phase transitions in which a small increase in capability produces a large drop in true performance. [7][8]

### Scalable oversight and emergent behavior

As large models grew, Steinhardt's group turned to scalable oversight, the problem of supervising systems whose behavior humans cannot fully check by hand. This included work on eliciting a model's latent knowledge and on automatically discovering failures in machine learning systems, lines of research that fed directly into the founding of Transluce. He also argued that scale produces qualitative, not merely quantitative, change. In a widely read 2022 essay series titled "More Is Different for AI," he contended that emergent capabilities make future systems hard to predict from present ones, and that this unpredictability is itself a safety problem. Separately, with Zachary Lipton, he co-wrote "Troubling Trends in Machine Learning Scholarship" (2018), a critique of common methodological and communication failures in the field. [1][11]

## Benchmarks

Steinhardt is closely associated with two benchmarks that became standard yardsticks for [large language models](/wiki/large_language_models). Both were led by Dan Hendrycks, then a Berkeley PhD student co-advised by Steinhardt and Dawn Song, with Steinhardt as a senior author.

MMLU, short for Measuring Massive Multitask Language Understanding, was introduced in a 2020 paper (published at ICLR 2021). It poses roughly 16,000 multiple-choice questions across 57 subjects, from elementary mathematics to professional law and medicine, and quickly became one of the most cited measures of broad knowledge and reasoning in language models. The MATH benchmark, presented at NeurIPS 2021, consists of 12,500 competition mathematics problems with full step-by-step solutions and was designed to be far harder than existing math datasets. The same collaboration also produced the ETHICS benchmark for testing models against shared human values. [5][6]

These benchmarks later played a central role in Steinhardt's forecasting work, because their difficulty made them natural test cases for predicting the pace of AI progress.

## Transluce

In July 2024 Steinhardt took leave from Berkeley to co-found Transluce together with Sarah Schwettmann, a research scientist from MIT CSAIL who had built MAIA, an early pipeline of interpretability agents. The organization launched publicly on October 23, 2024, as a nonprofit whose stated mission is to build open, scalable technology for understanding AI systems and steering them in the public interest. Steinhardt serves as co-founder and chief executive. [2][4]

Transluce's launch centered on AI agents that help people investigate other AI systems. It released an automated pipeline that generates human-readable descriptions of the internal features of a model, producing a database of descriptions for all 458,752 MLP neurons in Meta's Llama 3.1 8B model; an observability interface called Monitor for inspecting and steering those features; and behavior-elicitation agents that automatically search for specified behaviors in frontier models such as Llama 405B and [GPT-4](/wiki/gpt_4). The code and explainer models were open-sourced on GitHub and Hugging Face at the end of October 2024. The lab subsequently released Docent, a system for rapidly analyzing AI agent transcripts to surface bugs, scaffolding problems, and unexpected behaviors, and continued work on investigator agents for eliciting pathological behaviors and jailbreaks from frontier models. [4][13]

Steinhardt is also listed among the team of LawZero, the nonprofit AI safety lab launched by [Yoshua Bengio](/wiki/yoshua_bengio) in June 2025 to pursue safe-by-design "Scientist AI." [12]

## Forecasting

Steinhardt has run a multi-year program of empirical AI forecasting, commissioning predictions about future benchmark performance and then scoring how accurate they were. Beginning in August 2021, he and collaborators used the crowd-forecasting platform Hypermind, along with professional superforecasters and the Metaculus community, to predict state-of-the-art results on MMLU and MATH up to several years out. [10]

The consistent finding was that forecasters substantially underestimated the pace of progress. In the first round, forecasters expected MMLU accuracy to rise only modestly, to about 57 percent by mid-2022, whereas the actual result reached roughly 68 percent, an outcome the forecasters had rated as highly unlikely. The pattern repeated the next year: for performance by mid-2023, the Hypermind crowd predicted a median MMLU score of 72.5 percent against an actual 86.4 percent, while MATH reached 69.6 percent, again far above expectations. Across the exercise Steinhardt's own forecasts and the Metaculus community tended to be the most accurate, AI domain experts came next, and the broad crowd platform fared worst. He uses his blog "Bounded Regret" (published at bounded-regret.ghost.io) to document these results and to argue that the field systematically under-forecasts capability gains, which has implications for how seriously to take rapid future advances. [10]

## Selected milestones

| Year | Milestone |
| --- | --- |
| 2012 | BS in mathematics, MIT; named a Hertz Fellow |
| 2016 | Co-authors "Concrete Problems in AI Safety" |
| 2017 | "Learning from Untrusted Data" (STOC); "Certified Defenses for Data Poisoning Attacks" (NeurIPS) |
| 2018 | PhD in computer science, Stanford (advisor Percy Liang) |
| 2019 | Joins UC Berkeley statistics faculty |
| 2020 to 2021 | Senior author on the MMLU and MATH benchmarks |
| 2021 | Launches Hypermind AI forecasting study on MMLU and MATH |
| 2022 | "The Effects of Reward Misspecification" (ICLR); "More Is Different for AI" essays |
| July 2024 | Takes leave from Berkeley to co-found Transluce with Sarah Schwettmann |
| October 23, 2024 | Transluce launches publicly with neuron-description tools, Monitor, and investigator agents |
| 2025 | Transluce releases Docent; Steinhardt listed on LawZero team |

As of 2026 Steinhardt leads Transluce while holding his associate professorship at UC Berkeley, where his group continues to work on interpretability, oversight, and the measurement of AI progress. [1][2][4]

## References

1. "Jacob Steinhardt." UC Berkeley faculty homepage. https://jsteinhardt.stat.berkeley.edu/
2. "Steinhardt Announces Co-founding of Transluce, a Non-profit AI research lab." UC Berkeley Department of Statistics, October 23, 2024. https://statistics.berkeley.edu/about/news/steinhardt-announces-co-founding-transluce-non-profit-ai-research-lab
3. "Jacob Steinhardt." Hertz Foundation. https://www.hertzfoundation.org/people/jacob-steinhardt/
4. "Releasing AI-driven tools for understanding AI systems." Transluce, October 2024. https://transluce.org/releasing-ai-tools
5. Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. "Measuring Massive Multitask Language Understanding." ICLR 2021; arXiv:2009.03300. https://arxiv.org/abs/2009.03300
6. Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt. "Measuring Mathematical Problem Solving With the MATH Dataset." NeurIPS 2021; arXiv:2103.03874. https://arxiv.org/abs/2103.03874
7. Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mane. "Concrete Problems in AI Safety." arXiv:1606.06565, 2016. https://arxiv.org/abs/1606.06565
8. Alexander Pan, Kush Bhatia, Jacob Steinhardt. "The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models." ICLR 2022; arXiv:2201.03544. https://arxiv.org/abs/2201.03544
9. Moses Charikar, Jacob Steinhardt, Gregory Valiant. "Learning from Untrusted Data." STOC 2017; arXiv:1611.02315. https://arxiv.org/abs/1611.02315
10. Jacob Steinhardt. "AI Forecasting: Two Years In." Bounded Regret. https://bounded-regret.ghost.io/scoring-ml-forecasts-for-2023/
11. Jacob Steinhardt. "More Is Different for AI." Bounded Regret, 2022. https://bounded-regret.ghost.io/more-is-different-for-ai/
12. "Jacob Steinhardt." LawZero team page. https://lawzero.org/en/team/jacob-steinhardt
13. "Monitor: An AI-Driven Observability Interface." Transluce. https://transluce.org/observability-interface