Noam Brown
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,533 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,533 words
Add missing citations, update stale details, or suggest a clearer explanation.
Noam Brown is an American computer scientist and research scientist at openai who specializes in artificial intelligence reasoning, multi-agent learning, search algorithms, and self-play. He is best known as the co-creator of three landmark game-playing AI systems: Libratus (2017), the first artificial intelligence to defeat top human professionals at heads-up no-limit Texas hold'em poker; Pluribus (2019), the first to beat elite professionals at six-player no-limit poker; and CICERO (2022), the first to reach human-level play in the strategy game Diplomacy through a combination of natural language negotiation and strategic planning.[1][2][3] After joining OpenAI in 2023, Brown became a central figure behind the company's o-series of reasoning models, including the September 2024 launch of o1 and the subsequent o3.[4][5] His public talks on test-time compute scaling, including a widely cited 2024 TED AI presentation, helped frame the industry shift toward inference-time reasoning as a complement to pretraining scale.[6]
| Field | Value |
|---|---|
| Born | United States |
| Education | BA, Rutgers University (2008); MS Robotics, cmu (2014); PhD Computer Science, Carnegie Mellon University (2020) |
| Doctoral advisor | Tuomas Sandholm |
| Employers | openai (2023 to present); Facebook AI Research / meta ai (2018 to 2023); Carnegie Mellon University (2012 to 2018) |
| Known for | Libratus, Pluribus, CICERO, OpenAI o1 |
| Notable awards | Marvin Minsky Medal (2019); MIT Technology Review 35 Innovators Under 35 (2019); NeurIPS Best Paper Award (2017); AAAI/ACM SIGAI Dissertation Award (2020) |
Brown grew up in New Jersey and attended Rutgers University, where from 2005 to 2008 he completed a BA in mathematics and computer science.[7] He graduated summa cum laude and as a member of the Rutgers College Honors Program, and received the Rutgers Computer Science Department Highest Honors recognition in 2009.[7] During his undergraduate years he served as a recitation instructor for calculus and pre-calculus courses and as a peer mentor, beginning a sustained pattern of teaching activity that continued throughout his career.[7]
Between 2006 and 2010 Brown worked as an algorithmic trading engineer at MJM Trading Group in New York, then from 2010 to 2012 as a research assistant at the Federal Reserve Board of Governors in Washington, D.C., where he studied issues related to algorithmic trading in the International Financial Markets section.[7] These experiences exposed him to large-scale stochastic decision-making under uncertainty, a class of problem closely related to the imperfect-information games he would later study academically.
In 2012 Brown enrolled at Carnegie Mellon University, completing a master's degree in robotics in 2014 under the supervision of Tuomas Sandholm before continuing into the PhD program in computer science.[7] He finished his PhD in 2020 with the dissertation "Equilibrium Finding for Large Adversarial Imperfect-Information Games", supervised by Sandholm.[8] The thesis built directly on the algorithms used in Libratus and the related poker bots Tartanian7, Baby Tartanian8, and Claudico. Brown's dissertation received the Carnegie Mellon School of Computer Science Distinguished Dissertation Award, the AAAI/ACM SIGAI Dissertation Award, and the IFAAMAS Victor Lesser Distinguished Dissertation Award, all in 2020.[7]
During his graduate studies Brown also held a 2017 summer research internship at DeepMind in London.[7]
Brown's research career began as a research assistant at Carnegie Mellon, working in Sandholm's group on computational game theory and equilibrium finding for large adversarial games.[7] The group had a long history of poker AI work, including Tartanian7, which Brown co-developed in 2014.[7] In 2015 the Sandholm lab fielded Claudico in the inaugural "Brains vs. Artificial Intelligence" competition against four top heads-up no-limit Texas hold'em professionals at Rivers Casino in Pittsburgh; the humans narrowly won.[9] Brown and Sandholm developed a successor system that came to be called Libratus, named from the Latin word for "balance".[10]
In January 2017, Libratus competed against four top professionals (Jason Les, Dong Kim, Daniel McAulay, and Jimmy Chou) over 20 days at Rivers Casino in a tournament titled "Brains vs. Artificial Intelligence: Upping the Ante."[11] The contest ran from January 11 to January 30 and consisted of 120,000 hands of heads-up no-limit Texas hold'em distributed across the four professionals.[11] A prize pool of 200,000 dollars was allocated to the players, with each human guaranteed 20,000 dollars and the remaining funds split based on performance against Libratus.[11] At the end of the event, Libratus led by a cumulative 1,766,250 chips and defeated each individual professional, beating the humans collectively by a margin of 147 milli-big-blinds per game with 99.98 percent statistical significance.[11] During play, Libratus ran on the Bridges supercomputer at the Pittsburgh Supercomputing Center, using up to 600 nodes for nightly self-improvement against the patterns observed in that day's hands.[14] The result was widely reported as the first time an AI had decisively beaten top human professionals at heads-up no-limit poker, a long-standing milestone for AI research because no-limit poker is a large imperfect-information game in which players hide private cards and use unrestricted bet sizing.[11]
Sandholm and Brown subsequently published the underlying methods in a December 2017 paper in Science, titled "Superhuman AI for heads-up no-limit poker: Libratus beats top professionals."[1] The paper described a three-component architecture. The first component was a "blueprint" strategy precomputed offline for a coarse abstraction of the game that grouped similar hands and bet sizes together to reduce the size of the strategy space.[10] The second was nested subgame solving, which recomputed finer-grained strategies in real time during play whenever an opponent took an action that fell outside the abstraction.[10] The third was a self-improvement module that, after each day of play, looked at the opponents' actions and added missing branches to the blueprint rather than simply trying to exploit the humans' mistakes.[10] Brown said at the time that "treating those hands as identical reduces the complexity of the game and, thus, makes it computationally easier."[10] The authors stressed that the techniques were largely domain-independent and could be applied to other large imperfect-information settings such as negotiation, cybersecurity, and strategic auctions.[10] A companion paper, "Safe and Nested Subgame Solving for Imperfect-Information Games" by Brown and Sandholm, received the Best Paper Award at NeurIPS 2017, one of three awards selected from more than 3,240 submissions.[7][12] Brown also received the 2017 Allen Newell Award for Research Excellence at Carnegie Mellon for this body of work.[7]
The Libratus project was selected by Science as one of 12 candidates for its 2017 Breakthrough of the Year award and named one of the top 10 scientific achievements of 2017 by the French magazine La Recherche.[7] Brown and Sandholm later received the Marvin Minsky Medal for Outstanding Achievements in AI from IJCAI in 2019.[7]
In 2018 Brown joined Facebook AI Research (later rebranded as meta ai) in New York as a research scientist while continuing to complete his PhD under Sandholm at Carnegie Mellon.[7] His collaboration with the Sandholm lab continued, producing the next major poker milestone: Pluribus, the first AI to achieve superhuman performance in six-player no-limit Texas hold'em.[2]
Six-player poker had been considered a significantly harder problem than the two-player version because multi-player games of incomplete information generally lack tractable equilibrium-based solutions. Heads-up zero-sum games can be solved to a Nash equilibrium that guarantees no opponent can do better than break even on average, but no such guarantee exists in games with three or more players.[13] Pluribus instead used a coarse, action-abstracted blueprint strategy trained through self-play with a depth-limited search procedure that constructed real-time strategies during play. The system used a novel limited-lookahead search algorithm that considered four possible continuation strategies for opponents at each depth limit, rather than attempting to solve the full subgame.[2]
Pluribus was evaluated in two formats. In the first, the AI played 10,000 hands against five professionals at once. In the second, five copies of Pluribus played 5,000 hands against a single human professional. Among the professionals were Darren Elias, the World Poker Tour record holder for tournament titles, and Chris "Jesus" Ferguson, a six-time World Series of Poker champion, alongside 13 additional professionals who had each won more than one million dollars playing poker.[14] Pluribus won by a statistically significant margin in both formats.[2]
The work was published in Science on July 11, 2019, in a paper by Brown and Sandholm titled "Superhuman AI for multiplayer poker," and was featured on the journal's cover.[2] Pluribus was named a runner-up for Science magazine's 2019 Breakthrough of the Year.[7] One technical aspect that drew attention was the system's computational efficiency: Pluribus computed its blueprint strategy in eight days on a server using a budget of about 12,400 core-hours, then ran during live play on a desktop machine with 28 CPU cores and no GPUs, costing on the order of 150 dollars in cloud computing time per run.[14] Brown noted that this contrasted sharply with the very large compute budgets of contemporaneous game AI systems like alphago Zero and alphazero, and argued the result demonstrated that algorithmic improvements in equilibrium finding could yield superhuman play without massive compute.[14] In Scientific American's coverage of the work, the magazine described six-player poker as "poker's final milestone" and highlighted the absence of theoretical guarantees in multi-player imperfect-information settings, which had been considered an open problem in AI and game theory for decades.[13]
Brown was named to MIT Technology Review's 2019 list of 35 Innovators Under 35 in the Visionary category for this and earlier work.[15]
While at FAIR, Brown also led work on deep learning extensions of equilibrium finding. In 2019, with Adam Lerer, Sam Gross, and Sandholm, he published "Deep Counterfactual Regret Minimization" at ICML, a neural-network-based variant of the CFR algorithm family that had underpinned Libratus.[16] In 2020 he was joint first author with Anton Bakhtin on "Combining Deep Reinforcement Learning and Search for Imperfect-Information Games," which introduced ReBeL, a framework that combined self-play reinforcement learning with online search and achieved superhuman performance in heads-up no-limit poker using substantially less domain knowledge than Libratus.[17] The ReBeL code was open-sourced by Facebook AI Research.[17]
Starting around 2019, Brown began working on the cooperative-and-competitive game Diplomacy as a step beyond zero-sum poker.[3][18] Diplomacy is a seven-player board game in which players negotiate in natural language between movement phases, forming and breaking alliances. The game has been a long-standing AI research target because it combines hidden intentions, multi-agent strategic reasoning, and unrestricted natural-language communication.[18] Brown said in podcast interviews that when he first started, he assumed reaching human-level play would take a decade.[3]
In November 2022, the FAIR Diplomacy team published the system known as CICERO. The work appeared in Science on November 22, 2022, under the title "Human-level play in the game of Diplomacy by combining language models with strategic reasoning," with Brown as one of three lead authors alongside Anton Bakhtin and Emily Dinan, and a roster of more than 20 co-authors from FAIR.[3][18] CICERO combined a controllable dialogue model (a BART-style language model with about 2.7 billion parameters fine-tuned on more than 40,000 human Diplomacy games) with a strategic planning algorithm called piKL that conditioned action selection on inferred opponent intentions and human-like move distributions.[18]
CICERO was evaluated in 40 online games on the webDiplomacy.net platform among unsuspecting human players. It finished in the top 10 percent of participants who had played more than one game and achieved more than double the average human score.[18] Meta noted that CICERO's human collaborators frequently preferred working with the agent over other humans because of its consistency, reliability, and persuasive communication.[3] Meta released the code and model weights to support follow-on research.[18]
A related conference paper, "Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning" (Bakhtin, Wu, Lerer, Gray, Jacob, Farina, Miller, and Brown), received a Best Paper Honorable Mention at ICLR 2023 for the variant of Diplomacy without natural-language communication.[7] The CICERO project drew significant attention beyond the AI research community because it demonstrated that a system trained to play a game involving deception in principle could nonetheless be designed to behave honestly and cooperatively in practice, an approach the researchers stressed in interviews.[3]
Brown announced on X (formerly Twitter) on July 6, 2023, that he was joining openai.[4] He wrote, "I'm thrilled to share that I've joined @OpenAI. For years I've researched AI self-play and reasoning in games like Poker and Diplomacy. I'll now investigate how to make these methods truly general. If successful, we may one day see LLMs that are 1,000x better than GPT-4."[4] He has been based in OpenAI's San Francisco offices since.[7]
Brown was a core contributor to OpenAI's o-series of reasoning models. On September 12, 2024, when OpenAI released o1 (initially as o1-preview and o1-mini), Brown wrote a public thread describing the project and his role.[19][5] He framed o1 as "the fruit of our effort at OpenAI to create AI models capable of truly general reasoning" and connected the design directly to the lessons of his earlier work on poker and Diplomacy: the value of having an AI system "think" at inference time rather than simply respond from its training distribution.[19][20] OpenAI's announcement explained that o1 had been trained with large-scale reinforcement learning to produce a long internal chain of thought before answering, and that performance improved with both more training compute and more inference compute.[5] Brown often used the term "system-two thinking" (borrowing from Daniel Kahneman's distinction between fast and slow cognition) to describe what o1 was doing relative to prior LLMs.[6]
On the qualifying exam for the American Invitational Mathematics Examination (AIME), o1 reportedly solved 83 percent of problems, compared to 13 percent for GPT-4o; on graduate-level science benchmarks, o1 reached approximately PhD-level performance on physics, chemistry, and biology questions.[5] The OpenAI announcement specifically credited Brown as a "foundational contributor" to the project.[5]
On December 20, 2024, OpenAI previewed o3, skipping the o2 name due to a trademark conflict.[21] Brown said on X afterward that "we have every reason to believe this trajectory will continue," referring to the rapid pace of capability gains from scaling reinforcement learning and inference-time compute.[21] o3 went on to set state-of-the-art results on the ARC-AGI benchmark and on competitive programming benchmarks.[21] OpenAI later released o3-mini, with Brown commenting that the smaller distilled model outperformed full o1 on a number of evaluations and saying "we're shifting the entire cost-intelligence curve."[21]
In September 2024, Brown announced that he was leading a newly formed multi-agent research team at OpenAI together with Kevin Leestone, with the stated view that multi-agent learning is a path to better AI reasoning.[22] The team's framing built directly on Brown's earlier research arc: heads-up poker as a two-agent zero-sum problem, six-player poker as a many-agent zero-sum problem, and Diplomacy as a mixed cooperative-competitive problem with natural-language communication. He has argued in talks and podcasts that the reasoning techniques behind o1 and o3 generalize naturally to multi-agent settings and that progress on multi-agent reasoning will require, but also drive, deeper progress on alignment and human-AI coordination.[26]
Brown's o-series work also informs OpenAI's broader openai o-series family. After the December 2024 preview of o3, OpenAI continued to release derivative variants such as o3-mini, and Brown remained a public point of reference for the company's communications about reasoning-model progress.[21] His public arguments helped establish the phrase "test-time compute scaling" as a standard description of the new scaling axis represented by o1 and successors.[6][26]
Brown's research output before joining OpenAI focused on equilibrium finding in large imperfect-information games. The work falls into several connected strands.
Counterfactual regret minimization (CFR) is a family of iterative self-play algorithms for finding approximate Nash equilibria in extensive-form games. Brown and Sandholm published a series of papers improving CFR's practical performance, including "Regret-Based Pruning in Extensive-Form Games" (NeurIPS 2015), "Dynamic Thresholding and Pruning for Regret Minimization" (AAAI 2017), and "Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning" (ICML 2017).[7] In 2019 they published "Solving Imperfect-Information Games via Discounted Regret Minimization" at AAAI, which received an Outstanding Paper Honorable Mention; the paper introduced Discounted CFR (DCFR), which converged significantly faster than prior CFR variants and underpinned Pluribus.[7][23] These algorithmic improvements were a major reason Pluribus could solve six-player poker on modest hardware.
"Safe and Nested Subgame Solving for Imperfect-Information Games" (NeurIPS 2017, Best Paper Award) introduced techniques for refining an offline-computed blueprint strategy in real time during play of an imperfect-information game while preserving theoretical safety guarantees.[12] Brown, Sandholm, and Brandon Amos extended this with "Depth-Limited Solving for Imperfect-Information Games" (NeurIPS 2018), which adapted the depth-limited search techniques common in perfect-information game-playing (such as those used by chess and Go engines) to settings with hidden information by considering a small number of opponent continuation strategies at the search horizon.[24] This algorithmic line connects directly to Brown's later public arguments that inference-time search and planning can be a powerful complement to large-model training.
Brown's work at FAIR generalized search-based equilibrium finding to settings with neural function approximation. "Deep Counterfactual Regret Minimization" (ICML 2019, with Adam Lerer, Sam Gross, and Sandholm) was the first practical CFR variant to use neural networks to approximate counterfactual values.[16] "Combining Deep Reinforcement Learning and Search for Imperfect-Information Games" (NeurIPS 2020, with Bakhtin, Lerer, and Qucheng Gong) introduced ReBeL, which extended the AlphaZero-style combination of self-play reinforcement learning and search to imperfect-information games via a "public belief state" formulation.[17] Subsequent work on Diplomacy and on Hanabi (a cooperative imperfect-information card game) further developed the human-regularized variant of these ideas, culminating in CICERO and in conference papers such as "Human-Level Performance in No-Press Diplomacy via Equilibrium Search" (ICLR 2021) and "Modeling Strong and Human-Like Gameplay with KL-Regularized Search" (ICML 2022).[7]
A recurring theme in Brown's research before joining OpenAI was the argument that combining neural networks with search at inference time was more powerful than either component alone. He frequently contrasted his poker AI systems with AlphaGo and AlphaZero, noting that all three systems relied on lookahead search at decision time and that this was a key ingredient in their superhuman performance, even though only the poker work tackled hidden-information settings.[14] In interviews and talks Brown described the inference-time component as carrying disproportionate weight in performance, an observation that anticipated the public framing of test-time compute scaling that he and OpenAI later popularized.[6][14]
After joining OpenAI, Brown became one of the most prominent public voices arguing that scaling test time compute (allowing models to "think" longer at inference) was an underdeveloped axis of AI progress.[6][20] In a widely circulated talk at the TED AI Conference in San Francisco in October 2024, Brown said that letting his poker AI "think" for 20 seconds before each decision produced an improvement equivalent to scaling pretraining data by a factor of 100,000, and argued that an analogous effect held for general-purpose language models.[6] He framed the observation as one of the most important findings from his Libratus work and presented it as motivation for his transition from games-focused research to general AI reasoning.[6] He has repeated and expanded on this argument in podcast appearances, including a Sequoia Capital "Training Data" episode with co-authors Hunter Lightman and Ilge Akkaya following the o1 launch, and in a 2025 Latent Space podcast episode on multi-agent civilizations.[25][26]
In a March 2025 interview, Brown told TechCrunch that, with hindsight, the broad recipe behind reasoning models could have been developed two decades earlier, but that the field had pursued the wrong research directions; he argued that the central observation was simply that humans visibly think for a long time before producing answers to hard problems, and that AI systems should be allowed to do the same.[27] He also said that benchmark design itself was a high-impact, low-compute area for academic researchers to contribute to, noting "the state of benchmarks in AI is really bad, and that doesn't require a lot of compute to do."[27]
Brown is an active public speaker on AI reasoning and games. His CV lists invited talks at MIT, Harvard, Stanford, UC Berkeley, Oxford, Tel Aviv University, the Technion, the Hebrew University, the Flatiron Institute, and many other institutions, on topics including imperfect-information games, ReBeL, Pluribus, and Diplomacy.[7] After joining OpenAI he became a frequent conference speaker on test-time compute, including the October 2024 TED AI Conference keynote in San Francisco.[6] He has also appeared on podcasts including Lex Fridman's, Imbue's, the Sequoia "Training Data" series with the o1 team, and Latent Space.[25][26]
Brown maintains an active public presence on X under the handle @polynoamial, where he regularly comments on OpenAI announcements and broader debates about AI scaling.[4][19][21] Beyond research, he has volunteered as a guest lecturer and instructor at the Rutgers Young Scholars Program for gifted high school students, teaching game theory in a week-long summer course continuously from 2009 onward.[7] He also presented in the Federal Reserve Board's FedEd program (2010 to 2012), in Carnegie Mellon's Creative Technologies Nights for middle school girls (2015 to 2018), and in the Rutgers Douglas Project: Women in STEM (2008 to 2009), all efforts to broaden access to mathematics and computer science.[7]