Sam McCandlish

AI Companies Machine Learning People

8 min read

Updated May 31, 2026

Suggest edit History Talk

RawGraph

Last edited

May 31, 2026

Fact-checked

In review queue

Sources

12 citations

Revision

v1 · 1,658 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Sam McCandlish is an American AI researcher and a co-founder of Anthropic, the artificial intelligence company behind the Claude family of language models. ^[1]^[2] Trained as a theoretical physicist, he moved into machine learning research at OpenAI, where he led work on neural scaling laws and co-authored the 2020 paper that described how language model performance improves predictably with model size, dataset size, and compute. ^[3]^[4] He helped found Anthropic in early 2021 and has held several senior technical titles there, serving as chief architect since October 2025 with a focus on pre-training and large-scale model training. ^[2]^[5]

McCandlish is best known for empirical research showing that the behavior of large neural networks can be forecast from a small number of measurable quantities. That line of work shaped the development of GPT-3 and influenced how the wider field reasons about the returns to spending more on computation and data. ^[3]^[6]

Background and education

McCandlish studied mathematics and physics at Brandeis University, earning bachelor's and master's degrees there. ^[1] He then completed a Ph.D. in theoretical physics at Stanford University. ^[1]^[7] His doctoral research sat in the area of string theory and holography, the study of how gravity in a region of space can be described by a theory living on its boundary. His dissertation, titled "Depth Perception in Holography," was defended in May 2017 under the supervision of the physicist Eva Silverstein. ^[7]

Like several other early Anthropic researchers, McCandlish came to artificial intelligence after a physics training rather than through a conventional computer science route. The habit of looking for simple quantitative laws that hold across many orders of magnitude, common in physics, carried over into his later work on the scaling behavior of neural networks. ^[3]^[6]

Work at OpenAI

McCandlish joined OpenAI as a research scientist and became a research lead there. ^[1] His work centered on the empirical science of training large models efficiently and on understanding how their performance changes with scale.

In 2018 he was the first author of "An Empirical Model of Large-Batch Training," written with Jared Kaplan, Dario Amodei, and the OpenAI Dota team. ^[8] The paper introduced a statistic the authors called the gradient noise scale, a measure of the variation in gradients across training examples, and argued that it predicts the largest batch size that remains useful before adding more parallel examples stops speeding up training. The result held across supervised learning, reinforcement learning, and generative modeling, and it gave practitioners a way to reason about how far they could parallelize a training run. ^[8]

McCandlish's most cited contribution came in January 2020, when he and Jared Kaplan led the team behind "Scaling Laws for Neural Language Models." ^[3]^[4] The paper studied how the cross-entropy loss of Transformer language models changes as researchers vary model size, dataset size, and the amount of compute used in training. It reported that the loss falls as a smooth power law in each of those three quantities, with some trends holding across more than seven orders of magnitude. ^[3] The authors also found that larger models are more sample-efficient, so that the compute-optimal strategy is to train very large models on a moderate amount of data and to stop well before the model has fully converged. ^[3] The other authors included Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. ^[4]

The scaling laws gave OpenAI a way to estimate, before committing to an expensive training run, how a much larger model was likely to perform. That reasoning fed directly into GPT-3, the 175 billion parameter model introduced in the May 2020 paper "Language Models are Few-Shot Learners," on which McCandlish is listed as a co-author. ^[6]^[9] GPT-3 showed that a single large model could perform many language tasks from a few examples given in its prompt, without task-specific fine-tuning, and it became one of the most influential systems in the recent history of the field. ^[6]

Co-founding Anthropic

In early 2021 a group of OpenAI researchers left to start a new company focused on AI safety research. Anthropic was founded in January 2021, with the siblings Dario Amodei and Daniela Amodei among its leaders, and McCandlish was one of the co-founders. ^[2]^[10] Public accounts and the company's own materials list the founding group as including Dario Amodei, Daniela Amodei, Jared Kaplan, McCandlish, Tom Brown, Jack Clark, Chris Olah, and Benjamin Mann. ^[2]^[10] Several of them, including McCandlish and Kaplan, had worked together on the scaling laws research at OpenAI. ^[3]^[10] The founders raised an initial round of about 124 million dollars in May 2021. ^[10]

Anthropic described itself as an AI safety and research company and set out to build large models while studying how to make them more reliable and interpretable. ^[10] Its main public product is Claude, a family of conversational and reasoning models that compete with systems from OpenAI, Google, and others. ^[2]

Role and focus areas at Anthropic

McCandlish has held a series of senior technical positions at Anthropic. Reporting and company sources describe him as a co-founder who has served in roles including chief scientist and chief technology officer, leading the organization responsible for building the company's large language models. ^[1]^[2] In these roles his work continued the themes of his earlier research: training models at scale, understanding their behavior, and predicting their capabilities in advance.

In October 2025 Anthropic restructured its technical leadership. The company hired Rahul Patil, formerly chief technology officer of Stripe, as its new CTO with responsibility for infrastructure, inference, and engineering, while McCandlish moved into the title of chief architect. ^[5] In the chief architect role he focuses on pre-training and large-scale model training, work that the reporting described as an extension of much of what he had done before. ^[5] Under the new structure both Patil and McCandlish report to Anthropic's president, Daniela Amodei. ^[5] The change came as demand for Claude strained the company's systems and as competition over computing capacity intensified across the industry. ^[5]

The predictability of model behavior, the subject of McCandlish's scaling research, connects to Anthropic's stated interest in safety. If the capabilities of a future system can be estimated before it is built, then developers can plan tests and precautions for those capabilities in advance rather than discovering them only after training. Anthropic has built this idea into commitments such as its responsible scaling framework, which ties safety measures to model capability levels. ^[11]

Significance of scaling laws

The scaling laws that McCandlish helped establish changed how many researchers and companies plan their work. By framing the relationship between compute, data, model size, and loss as smooth and quantifiable, the research turned the question of how to improve a model into something closer to an engineering forecast. ^[3]^[6] Organizations could weigh the expected gains from a larger training run against its cost, and the prospect of reliable returns helped justify the large capital investments that followed across the field. ^[6]^[12]

Later research refined the picture. In 2022 a team at DeepMind published the Chinchilla scaling results, which argued that the earlier work had used too little data for the size of its models and that compute should be split more evenly between making models larger and training them on more tokens. ^[12] The Chinchilla finding adjusted the recommended balance of model size and data, while keeping the central idea that performance follows predictable power laws. McCandlish's 2020 paper remains one of the most cited works in the area and is widely treated as the origin of the modern study of neural scaling laws. ^[3]^[12]

Recognition

McCandlish is recognized within the AI research community mainly through the influence and citation record of his papers. "Scaling Laws for Neural Language Models" and "Language Models are Few-Shot Learners" are both among the most cited machine learning papers of their period, and his role as co-founder of Anthropic has placed him among the senior technical figures of a leading AI company. ^[3]^[6]^[2] He keeps a relatively low public profile compared with some of his co-founders, and detailed biographical information about him outside his research and corporate roles is limited in reliable sources. ^[1]

Facts

Item	Detail
Full name	Samuel McCandlish
Nationality	American
Known for	Neural scaling laws; co-founding Anthropic
Education	B.S. and M.S. in mathematics and physics, Brandeis University; Ph.D. in theoretical physics, Stanford University (2017)
Doctoral dissertation	"Depth Perception in Holography" (advisor Eva Silverstein)
Field	Theoretical physics, then machine learning
Former employer	OpenAI (research scientist and research lead)
Company co-founded	Anthropic (January 2021)
Role at Anthropic	Co-founder; chief architect (from October 2025); previously chief scientist and chief technology officer
Focus areas	Pre-training, large-scale model training, model scaling
Notable papers	"An Empirical Model of Large-Batch Training" (2018); "Scaling Laws for Neural Language Models" (2020); "Language Models are Few-Shot Learners" (2020)

References

"Sam McCandlish." Longterm Wiki. https://www.longtermwiki.com/wiki/E1259 ↩
"Anthropic." Wikipedia. https://en.wikipedia.org/wiki/Anthropic ↩
Kaplan, Jared, Sam McCandlish, et al. "Scaling Laws for Neural Language Models." arXiv, January 2020. https://arxiv.org/abs/2001.08361 ↩
"Scaling laws for neural language models." OpenAI. https://openai.com/index/scaling-laws-for-neural-language-models/ ↩
"Anthropic hires new CTO with focus on AI infrastructure." TechCrunch, October 2, 2025. https://techcrunch.com/2025/10/02/anthropic-hires-new-cto-with-focus-on-ai-infrastructure/ ↩
Brown, Tom B., et al. "Language Models are Few-Shot Learners." arXiv, May 2020. https://arxiv.org/abs/2005.14165 ↩
"Physics Ph.D. Dissertation Defense: Sam McCandlish." Stanford University Physics Department. https://physics.stanford.edu/events/physics-phd-dissertation-defense-sam-mccandlish ↩
McCandlish, Sam, Jared Kaplan, Dario Amodei, and the OpenAI Dota Team. "An Empirical Model of Large-Batch Training." arXiv, December 2018. https://arxiv.org/abs/1812.06162 ↩
"GPT-3: Language Models are Few-Shot Learners." GitHub. https://github.com/openai/gpt-3 ↩
"Anthropic Business Breakdown & Founding Story." Contrary Research. https://research.contrary.com/company/anthropic ↩
"Anthropic's Responsible Scaling Policy." Anthropic. https://www.anthropic.com/news/anthropics-responsible-scaling-policy ↩
Hoffmann, Jordan, et al. "Training Compute-Optimal Large Language Models." arXiv, March 2022. https://arxiv.org/abs/2203.15556 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Benjamin "Ben" Mann Daniela Amodei

Background and education

Work at OpenAI

Co-founding Anthropic

Role and focus areas at Anthropic

Significance of scaling laws

Recognition

Facts

References

Improve this article

Related Articles

Yoav Shoham

Wojciech Zaremba

Andrej Karpathy

Arthur Mensch

Elon Musk

Amnon Shashua

What links here

Related Articles

Yoav Shoham

Wojciech Zaremba

Andrej Karpathy

Arthur Mensch

Elon Musk

Amnon Shashua

What links here