Tom B. Brown
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,304 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,304 words
Add missing citations, update stale details, or suggest a clearer explanation.
Tom B. Brown is an American artificial intelligence engineer and researcher best known as the lead author of "Language Models are Few-Shot Learners," the 2020 paper that introduced GPT-3.[1][2] Brown joined OpenAI in the late 2010s and led the engineering of the 175-billion-parameter language model that established large-scale in-context learning as a central paradigm in modern natural language processing.[1][3] In 2021 he co-founded Anthropic with Dario Amodei, Daniela Amodei, Jared Kaplan, Sam McCandlish, Chris Olah, Jack Clark, Ben Mann, and other former OpenAI colleagues, where he leads compute infrastructure as Chief Compute Officer.[4][5][6] His earlier career included co-founding the Y Combinator backed social startup Grouper and working as a research engineer at Google Brain, where he co-authored the 2017 "Adversarial Patch" paper.[7][8]
Brown earned a Master of Engineering at the Massachusetts Institute of Technology, studying at the intersection of computer science and brain and cognitive sciences.[6][9] In a 2025 Y Combinator podcast appearance, he recounted scoring a B-minus in his linear algebra course before he eventually transitioned into machine learning research, and described himself as having taken a self-taught route into AI rather than entering through a traditional PhD program.[6][9][10]
After MIT, Brown worked as an early engineer at the mobile advertising company MoPub, joining in early 2011.[11] He then co-founded Grouper, a social club that arranged group dates between sets of friends. The company was funded by Y Combinator in the Winter 2012 batch and became a profile-raising venture in the early-2010s New York technology scene.[7][9] Through Grouper, Brown built a personal relationship with Greg Brockman, an early user of the app and later the president and co-founder of OpenAI; this connection would prove decisive for Brown's subsequent move into AI research.[9][10]
After Grouper, Brown took on a self-directed retraining program to enter machine learning. He has publicly described studying Sheldon Axler's textbook Linear Algebra Done Right, completing online courses on Coursera, working through Kaggle competitions, and buying a graphics processing unit using credit available to Y Combinator alumni in order to run experiments.[9][10] By his own account this period lasted roughly six months before he pitched himself for an AI role.[9][10]
Brown joined Google's Brain team as a research engineer in the mid-2010s, working primarily on robustness, security, and adversarial machine learning.[8][12] On December 27, 2017 he posted "Adversarial Patch" to arXiv with Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer.[8] The paper showed that small printable image patches could be crafted to reliably fool image classifiers into reporting an attacker-chosen class, even when the patches are physically photographed in a real-world scene; the work has become a standard reference in research on physical-world adversarial attacks.[8]
Brown remained at Google Brain into 2018, where on September 13, 2018 he and Catherine Olsson, both then identified as Research Engineers on the Google Brain Team, published the announcement of the "Unrestricted Adversarial Examples Challenge" on the Google Research blog.[12] The challenge was designed to push image classifiers toward zero confident misclassifications under broader threat models, and Brown was a co-author of the accompanying NeurIPS workshop paper "Unrestricted Adversarial Examples" with Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul Christiano, and Ian Goodfellow.[12][13] In a Y Combinator interview Brown noted that this Google Brain period followed his initial OpenAI involvement and gave him deeper experience with modern deep-learning frameworks before he returned to OpenAI to work on scaling language models.[9][10]
Brown was an early employee at OpenAI, joining among the lab's first roughly twenty staff.[9][10] His initial work at OpenAI focused on reinforcement learning for games rather than language modeling, and he transitioned to language research after several months.[9][10] By 2017 he was a co-author on the foundational paper "Deep Reinforcement Learning from Human Preferences," published at NIPS 2017, where his collaborators included Paul Christiano, Jan Leike, Miljan Martic, Shane Legg, and Dario Amodei.[14] The paper showed that a deep reinforcement learning agent could learn complex behaviors, including Atari games and simulated robot locomotion, from human preference labels covering less than one percent of the agent's interactions; it is widely treated as a foundational citation for what later became Reinforcement Learning from Human Feedback.[14]
Brown returned to OpenAI after his Google Brain interval. In 2019 he was an author on "Fine-Tuning Language Models from Human Preferences," with Daniel Ziegler, Nisan Stiennon, Jeffrey Wu, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving.[15] During this period he was also involved in OpenAI's work on GPT-2 and on emerging scaling research; his Google Scholar profile lists him as a co-author on the 2020 paper "Scaling Laws for Neural Language Models," led by Jared Kaplan with Sam McCandlish and Tom Henighan, which became a central empirical reference for how loss scales with model size, dataset size, and compute.[16][17]
Brown's most influential contribution at OpenAI was his lead role on GPT-3. The paper "Language Models are Few-Shot Learners" was posted to arXiv on May 28, 2020, with Brown listed as first author together with co-first authors Benjamin Mann and Nick Ryder, and twenty-eight additional collaborators.[1][2] The work introduced GPT-3, an autoregressive Transformer language model with 175 billion parameters, an order of magnitude larger than any prior non-sparse language model.[1][2] The paper presented evidence that scaling sufficiently increases what the authors called in-context learning, a setting in which the model performs novel tasks at inference time from a natural-language prompt containing few examples, with no gradient updates.[1][2]
Architecturally, GPT-3 is described in the paper as the same Transformer architecture as GPT-2 with one main modification: alternating dense and locally banded sparse attention patterns in the layers of the model, similar to the Sparse Transformer.[1] The largest model in the GPT-3 family, dubbed GPT-3 175B (later commonly known simply as GPT-3), has 96 attention layers, a model dimension of 12,288, 96 attention heads, and was trained with a context window of 2,048 tokens on a filtered version of Common Crawl supplemented with WebText2, Books1, Books2, and English Wikipedia.[1] The paper documents a family of eight models ranging from a 125-million-parameter GPT-3 Small through GPT-3 13B and culminating in the 175B Davinci variant, fit on the same training distribution to enable controlled scaling studies.[1] The training compute is reported in the paper in PetaFLOP/s-days; for the 175B model, the training run consumed thousands of PetaFLOP/s-days on a high-bandwidth cluster of NVIDIA V100 GPUs provided by Microsoft, in line with the predictions of the scaling laws Brown's colleagues had published earlier in 2020.[1][16]
The paper reported strong results on dozens of benchmarks spanning translation, question answering, cloze, completion, and reasoning, and showed that GPT-3 reached 64.3% accuracy on TriviaQA in the zero-shot setting, 68.0% in the one-shot setting, and 71.2% in the few-shot setting, the last matching or exceeding then state-of-the-art fine-tuned closed-book systems.[1][2] On the LAMBADA cloze task GPT-3 reached 76.2% in the zero-shot setting and 86.4% in the few-shot setting, well above prior state-of-the-art numbers.[1] The paper also documents weaker GPT-3 performance on tasks involving comparing two sentences or snippets, including natural language inference, where the authors note that scaling alone does not close the gap with fine-tuned models.[1] These results, taken together, were among the first systematic demonstrations that very large dense Transformers exhibit emergent capabilities that smaller models lack at the same training distribution.[1][2]
The work also explicitly discussed broader impacts, including the difficulty of distinguishing GPT-3-generated short news articles from human-written ones in their user studies, biases in race, gender, and religion in model outputs, energy and compute considerations, and risks of misuse such as automated misinformation.[1] At NeurIPS 2020, "Language Models are Few-Shot Learners" was one of the three Best Paper Award winners, alongside a Politecnico di Milano paper on no-regret learning and a paper from CMU and UC Berkeley on Monte Carlo planning.[18] As of 2026, Brown's few-shot learning paper has been cited more than 80,000 times according to his Google Scholar profile, making it one of the most cited works in the machine-learning literature.[17]
Beyond authorship order, contemporaneous descriptions credit Brown specifically with serving as the lead engineer of the GPT-3 effort, responsible for assembling the large training run and the engineering pipeline that made the 175-billion-parameter run feasible.[3][6][19] In his Y Combinator podcast appearance he described his role as the engineering lead bringing the model to convergence at unprecedented scale, working closely with the scaling-laws team under Jared Kaplan and Sam McCandlish to estimate the compute, data, and model sizes that would be needed to reach the run's target loss.[6][28] Brown's own framing on his GitHub profile reads "I work on robust and aligned AI at Anthropic. Previously at @openai and @brain-research."[20]
In addition to GPT-3, Brown was a co-author on several other influential OpenAI papers during 2020 and 2021. He is listed on "Measuring the Algorithmic Efficiency of Neural Networks" with Danny Hernandez (2020), which quantified the rate of algorithmic progress in image classification independent of compute and hardware improvements.[16] He was also a co-author on "Scaling Laws for Autoregressive Generative Modeling" with Tom Henighan, Jared Kaplan, Sam McCandlish, and other collaborators (2020), which extended the empirical scaling-law framework from language to vision, multimodal, and video domains and showed similar power-law behavior across modalities.[16] In 2021 he was a co-author on "Extracting Training Data from Large Language Models," a USENIX Security paper led by Nicholas Carlini that demonstrated practical attacks for retrieving training-set text from GPT-2, including verbatim recovery of personally identifying information; the paper became a foundational reference for memorization studies in subsequent language model literature.[16][21]
In late 2020, Dario Amodei, then OpenAI's vice president of research, departed the lab together with his sister Daniela Amodei, who had been OpenAI's vice president of safety and policy, in what multiple news outlets and company histories describe as a dispute over the pace of commercialization and the relative emphasis on safety research.[4][22] A group of senior OpenAI researchers, including Brown, left with them to form Anthropic as a Delaware public-benefit corporation.[4][5][22] Public timelines of the company give the formal founding as early 2021, with most accounts citing the Amodeis' departure in late 2020 and the new company beginning operation in 2021; Anthropic's own first public Series A announcement followed on May 28, 2021.[4][5][22]
The widely cited founders list comprises Dario Amodei (CEO), Daniela Amodei (President), Tom Brown, Chris Olah (interpretability research lead), Sam McCandlish (chief architect), Jared Kaplan (chief science officer), Jack Clark, and Ben Mann.[4][5] Multiple sources describe the founding group as drawn from people who had been close collaborators on GPT-2, GPT-3, scaling laws, and RLHF work at OpenAI.[4][5][22] Brown has publicly stated in interviews that he left OpenAI together with the Amodeis to co-found Anthropic; in a 2023 conversation with Salesforce Ventures he discussed Anthropic's safety-first approach, the use of Constitutional AI for alignment, and how compute scaling correlates with model capability, noting that "as we add more compute, the model gets smarter. The model's IQ goes up."[19]
Reports describe the founding team as initially small and operating remotely during the COVID-19 pandemic, gradually assembling research and engineering teams from former OpenAI staff and others through 2021.[4][5][22] Anthropic was incorporated as a public-benefit corporation rather than a conventional for-profit corporation, a structure that allows directors to balance shareholder financial interests with the company's stated public benefit purpose of "the responsible development and maintenance of advanced AI for the long-term benefit of humanity."[4][22] Subsequent governance additions included the establishment of a Long-Term Benefit Trust to hold a class of voting shares with the power to elect a portion of the company's board on the basis of mission rather than financial interest.[4][22]
On May 28, 2021, Anthropic announced a $124 million Series A round, led by Jaan Tallinn with participation from James McClave, Dustin Moskovitz, the Center for Emerging Risk Research, and Eric Schmidt.[23] Dario Amodei, listed as CEO, said in the announcement that "Anthropic's goal is to make the fundamental research advances that will let us build more capable, general, and reliable AI systems."[23] The company described itself in the announcement as an AI safety and research company focused on building reliable, interpretable, and steerable AI systems.[23] Subsequent timelines record incorporation as a Delaware public-benefit corporation in 2021 and a series of much larger funding rounds in 2022 and beyond, eventually including substantial investment from Google and Amazon.[4][5]
At Anthropic, Brown leads the team responsible for the company's compute infrastructure and large-scale training operations.[24] Public bios and contemporaneous coverage describe him as the leader of Core Resources, the team responsible for compute, data center capacity, custom-silicon access, and the training pipeline used to produce successive generations of Claude models.[24] His formal title is Co-Founder and Chief Compute Officer, the title used in industry coverage of his 2024 AWS re:Invent customer keynote and Anthropic's late-2025 networking partnership announcement with Arista.[25][26][27]
Brown's research output at Anthropic, as indexed by his Google Scholar profile, includes co-authorship on several of the company's foundational alignment papers, among them "A General Language Assistant as a Laboratory for Alignment" (2021) with Amanda Askell, Yuntao Bai, and others; "Predictability and Surprise in Large Generative Models" (2022) presented at FAccT with Deep Ganguli and colleagues; "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback" (2022) with Yuntao Bai and collaborators, which described Anthropic's early Claude-precursor RLHF pipeline; and "Scaling Laws and Interpretability of Learning from Repeated Data" (2022) with Danny Hernandez.[16][17] These papers are cited tens of thousands of times in aggregate and are central references for RLHF and alignment work in the post-GPT-3 era.[17]
In public appearances Brown has emphasized the infrastructure dimension of frontier AI. At a Y Combinator Lightcone podcast appearance in August 2025 he discussed the engineering tradeoffs of training Claude Code-grade systems and the difficulty of converting compute purchases into actual usable model training, noting that the bottleneck for frontier AI has shifted from algorithmic insight toward data center capacity, power availability, and chip supply.[28] In an Arista partnership video released in late 2025 he is identified as "Anthropic's Chief Compute Officer and Co-founder" discussing how the partnership combines Anthropic's AI expertise with Arista's networking infrastructure.[25] At AWS re:Invent 2024 he appeared as a customer keynote speaker in December 2024 in his Anthropic role, with AWS Trainium highlighted as part of the multi-chip strategy Anthropic uses for training Claude.[26][27]
In Brown's keynotes and interviews, Anthropic is one of the few large model developers reported to use three different families of training accelerators in production: NVIDIA GPUs, Google TPUs, and AWS Trainium (later Trainium 2) chips.[25][26][27] Brown has discussed this multi-chip approach as a way to diversify supply, hedge against capacity constraints with any single vendor, and gain leverage in negotiating compute purchases at the scale required to train frontier models. The September 2023 Amazon investment of up to $4 billion, and subsequent expansions, designated AWS as Anthropic's primary cloud provider and gave Anthropic privileged access to large allocations of Trainium chips for training Claude models, a relationship Brown has publicly highlighted in conference appearances and partner videos.[26][27]
Brown has also publicly contextualized AI infrastructure spending in historical terms. In multiple interviews he has compared the scale of capital commitments to AI compute over the late 2020s with major historical industrial undertakings such as the Apollo program and the Manhattan Project, arguing that the limiting factors for further model scaling are now power generation, data center construction, and high-bandwidth networking rather than algorithmic breakthroughs.[28] This framing aligns with the empirical scaling-law literature he helped develop earlier in his career at OpenAI and Google Brain.[16][28]
Brown's pre-language-model research focused on robustness of image classifiers under adversarial attacks. "Adversarial Patch" (2017) showed that printable, photographable, image-independent patches could reliably fool image classifiers, and "Unrestricted Adversarial Examples" (2018) generalized the threat model to allow arbitrary attacker-chosen inputs from a target class.[8][13] The associated Google Research blog post co-authored with Catherine Olsson introduced an open challenge intended to drive progress toward classifiers that make no confident errors on unrestricted attacker inputs.[12] Brown's adversarial work fits into a broader research lineage at Google Brain on robustness, including collaborations with Ian Goodfellow and Nicholas Carlini, and it informed the threat-model framing of subsequent AI safety research at OpenAI and Anthropic.[8][12][13]
Brown's central contribution to the field is GPT-3 and the in-context learning framing introduced in "Language Models are Few-Shot Learners."[1][2] The paper formalized three evaluation conditions, zero-shot, one-shot, and few-shot, where the model receives, respectively, only a task description, a task description plus a single demonstration, or a task description plus K demonstrations in the prompt, and is evaluated without any gradient updates.[1] The paper presents detailed evaluation curves showing how performance on tasks such as TriviaQA, LAMBADA, arithmetic, and translation improves with both model size and the number of in-context demonstrations, providing some of the first systematic evidence that the "few-shot learning" capability emerges and strengthens monotonically with scale.[1]
By demonstrating that a 175-billion-parameter Transformer could meaningfully execute novel tasks from prompts alone, GPT-3 catalyzed a shift in NLP from fine-tuning toward prompting as a primary interface to large models, and informed the design of subsequent commercial products and open-source language models.[1][2] Many later research subfields, including chain-of-thought prompting, instruction tuning, retrieval-augmented generation, and tool use, build on the in-context learning paradigm formalized by Brown and his co-authors and depend on the existence of GPT-3-scale base models as substrates.[1][2]
Brown's co-authorship on "Deep Reinforcement Learning from Human Preferences" (2017) and the 2019 "Fine-Tuning Language Models from Human Preferences" makes him one of the named contributors to the early lineage of RLHF before the technique was scaled to chat assistants.[14][15] At Anthropic, his role on "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback" (2022) extends this line of work into the development of Claude as a deployed assistant.[16][17] These three papers chart a continuous research thread spanning Atari-scale RL agents, mid-size language models fine-tuned for summarization-like tasks, and modern chat assistants, and Brown is one of a small number of researchers credited across the full progression.[14][15][16]
In his Anthropic role, Brown has publicly argued that frontier AI is now dominated by infrastructure constraints rather than algorithmic ones, citing capital outlays for AI compute that he has compared with the historical scale of programs such as the Apollo program in interviews.[28] His operational responsibilities at Anthropic include sourcing and integrating compute across multiple chip vendors, including AWS Trainium, NVIDIA GPUs, and Google TPUs, which Anthropic has used in tandem to train Claude models.[25][26][27] His historical contribution to scaling-law research, both as a co-author on the 2020 Kaplan paper and on subsequent generative-modeling scaling work, gives this operational role direct continuity with his research output: the empirical scaling laws he helped derive at OpenAI set the compute-budget expectations that his compute-organization team at Anthropic is responsible for meeting.[16][28]
The name "Tom Brown" is shared by multiple researchers and engineers. The Tom B. Brown described in this article, with Google Scholar identifier RLvsC94AAAAJ, has a verified anthropic.com email on Google Scholar and lists his affiliation as Anthropic.[17] He should not be confused with other researchers and engineers named Tom Brown working in unrelated fields. In particular, the Google Brain "Tom B. Brown" credited on the 2017 "Adversarial Patch" paper, the 2018 "Unrestricted Adversarial Examples" paper, and the September 2018 Google Research blog post is the same individual earlier in his career, as confirmed by his own GitHub profile listing both Anthropic and "@brain-research" (the Google Brain research GitHub organization) and OpenAI as prior affiliations.[17][20] His public handles include the X username @nottombrown and the GitHub username nottombrown, both of which appear on his Google Scholar and personal accounts.[20][28] His DBLP author page (pid/211/7884) is the canonical bibliographic record consolidating his publications under a single identifier across his Google Brain, OpenAI, and Anthropic periods.[16]
The following list draws from Brown's Google Scholar profile and the DBLP author record and includes citations as of mid-2026:
Brown shared in the NeurIPS 2020 Best Paper Award for "Language Models are Few-Shot Learners," alongside his co-authors.[18] His NeurIPS best paper recognition is widely cited as one of the most consequential awards in modern NLP because of the role GPT-3 played in establishing the in-context learning paradigm.[18] He is regularly listed among the named co-founders of Anthropic in independent reference works, business databases, and press coverage of the company.[4][5][22] His Google Scholar profile reports an h-index of 49 and an i10-index of 62 with total citations exceeding 130,000 as of 2026.[17]