Sam McCandlish
Last reviewed
May 31, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,658 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,658 words
Add missing citations, update stale details, or suggest a clearer explanation.
Sam McCandlish is an American AI researcher and a co-founder of Anthropic, the artificial intelligence company behind the Claude family of language models. [1][2] Trained as a theoretical physicist, he moved into machine learning research at OpenAI, where he led work on neural scaling laws and co-authored the 2020 paper that described how language model performance improves predictably with model size, dataset size, and compute. [3][4] He helped found Anthropic in early 2021 and has held several senior technical titles there, serving as chief architect since October 2025 with a focus on pre-training and large-scale model training. [2][5]
McCandlish is best known for empirical research showing that the behavior of large neural networks can be forecast from a small number of measurable quantities. That line of work shaped the development of GPT-3 and influenced how the wider field reasons about the returns to spending more on computation and data. [3][6]
McCandlish studied mathematics and physics at Brandeis University, earning bachelor's and master's degrees there. [1] He then completed a Ph.D. in theoretical physics at Stanford University. [1][7] His doctoral research sat in the area of string theory and holography, the study of how gravity in a region of space can be described by a theory living on its boundary. His dissertation, titled "Depth Perception in Holography," was defended in May 2017 under the supervision of the physicist Eva Silverstein. [7]
Like several other early Anthropic researchers, McCandlish came to artificial intelligence after a physics training rather than through a conventional computer science route. The habit of looking for simple quantitative laws that hold across many orders of magnitude, common in physics, carried over into his later work on the scaling behavior of neural networks. [3][6]
McCandlish joined OpenAI as a research scientist and became a research lead there. [1] His work centered on the empirical science of training large models efficiently and on understanding how their performance changes with scale.
In 2018 he was the first author of "An Empirical Model of Large-Batch Training," written with Jared Kaplan, Dario Amodei, and the OpenAI Dota team. [8] The paper introduced a statistic the authors called the gradient noise scale, a measure of the variation in gradients across training examples, and argued that it predicts the largest batch size that remains useful before adding more parallel examples stops speeding up training. The result held across supervised learning, reinforcement learning, and generative modeling, and it gave practitioners a way to reason about how far they could parallelize a training run. [8]
McCandlish's most cited contribution came in January 2020, when he and Jared Kaplan led the team behind "Scaling Laws for Neural Language Models." [3][4] The paper studied how the cross-entropy loss of Transformer language models changes as researchers vary model size, dataset size, and the amount of compute used in training. It reported that the loss falls as a smooth power law in each of those three quantities, with some trends holding across more than seven orders of magnitude. [3] The authors also found that larger models are more sample-efficient, so that the compute-optimal strategy is to train very large models on a moderate amount of data and to stop well before the model has fully converged. [3] The other authors included Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. [4]
The scaling laws gave OpenAI a way to estimate, before committing to an expensive training run, how a much larger model was likely to perform. That reasoning fed directly into GPT-3, the 175 billion parameter model introduced in the May 2020 paper "Language Models are Few-Shot Learners," on which McCandlish is listed as a co-author. [6][9] GPT-3 showed that a single large model could perform many language tasks from a few examples given in its prompt, without task-specific fine-tuning, and it became one of the most influential systems in the recent history of the field. [6]
In early 2021 a group of OpenAI researchers left to start a new company focused on AI safety research. Anthropic was founded in January 2021, with the siblings Dario Amodei and Daniela Amodei among its leaders, and McCandlish was one of the co-founders. [2][10] Public accounts and the company's own materials list the founding group as including Dario Amodei, Daniela Amodei, Jared Kaplan, McCandlish, Tom Brown, Jack Clark, Chris Olah, and Benjamin Mann. [2][10] Several of them, including McCandlish and Kaplan, had worked together on the scaling laws research at OpenAI. [3][10] The founders raised an initial round of about 124 million dollars in May 2021. [10]
Anthropic described itself as an AI safety and research company and set out to build large models while studying how to make them more reliable and interpretable. [10] Its main public product is Claude, a family of conversational and reasoning models that compete with systems from OpenAI, Google, and others. [2]
McCandlish has held a series of senior technical positions at Anthropic. Reporting and company sources describe him as a co-founder who has served in roles including chief scientist and chief technology officer, leading the organization responsible for building the company's large language models. [1][2] In these roles his work continued the themes of his earlier research: training models at scale, understanding their behavior, and predicting their capabilities in advance.
In October 2025 Anthropic restructured its technical leadership. The company hired Rahul Patil, formerly chief technology officer of Stripe, as its new CTO with responsibility for infrastructure, inference, and engineering, while McCandlish moved into the title of chief architect. [5] In the chief architect role he focuses on pre-training and large-scale model training, work that the reporting described as an extension of much of what he had done before. [5] Under the new structure both Patil and McCandlish report to Anthropic's president, Daniela Amodei. [5] The change came as demand for Claude strained the company's systems and as competition over computing capacity intensified across the industry. [5]
The predictability of model behavior, the subject of McCandlish's scaling research, connects to Anthropic's stated interest in safety. If the capabilities of a future system can be estimated before it is built, then developers can plan tests and precautions for those capabilities in advance rather than discovering them only after training. Anthropic has built this idea into commitments such as its responsible scaling framework, which ties safety measures to model capability levels. [11]
The scaling laws that McCandlish helped establish changed how many researchers and companies plan their work. By framing the relationship between compute, data, model size, and loss as smooth and quantifiable, the research turned the question of how to improve a model into something closer to an engineering forecast. [3][6] Organizations could weigh the expected gains from a larger training run against its cost, and the prospect of reliable returns helped justify the large capital investments that followed across the field. [6][12]
Later research refined the picture. In 2022 a team at DeepMind published the Chinchilla scaling results, which argued that the earlier work had used too little data for the size of its models and that compute should be split more evenly between making models larger and training them on more tokens. [12] The Chinchilla finding adjusted the recommended balance of model size and data, while keeping the central idea that performance follows predictable power laws. McCandlish's 2020 paper remains one of the most cited works in the area and is widely treated as the origin of the modern study of neural scaling laws. [3][12]
McCandlish is recognized within the AI research community mainly through the influence and citation record of his papers. "Scaling Laws for Neural Language Models" and "Language Models are Few-Shot Learners" are both among the most cited machine learning papers of their period, and his role as co-founder of Anthropic has placed him among the senior technical figures of a leading AI company. [3][6][2] He keeps a relatively low public profile compared with some of his co-founders, and detailed biographical information about him outside his research and corporate roles is limited in reliable sources. [1]
| Item | Detail |
|---|---|
| Full name | Samuel McCandlish |
| Nationality | American |
| Known for | Neural scaling laws; co-founding Anthropic |
| Education | B.S. and M.S. in mathematics and physics, Brandeis University; Ph.D. in theoretical physics, Stanford University (2017) |
| Doctoral dissertation | "Depth Perception in Holography" (advisor Eva Silverstein) |
| Field | Theoretical physics, then machine learning |
| Former employer | OpenAI (research scientist and research lead) |
| Company co-founded | Anthropic (January 2021) |
| Role at Anthropic | Co-founder; chief architect (from October 2025); previously chief scientist and chief technology officer |
| Focus areas | Pre-training, large-scale model training, model scaling |
| Notable papers | "An Empirical Model of Large-Batch Training" (2018); "Scaling Laws for Neural Language Models" (2020); "Language Models are Few-Shot Learners" (2020) |