Sébastien Bubeck
Last reviewed
Jun 7, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 · 1,558 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 7, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 · 1,558 words
Add missing citations, update stale details, or suggest a clearer explanation.
Sébastien Bubeck (born April 16, 1985) is a French-American mathematician and computer scientist known for his work in machine learning theory and, more recently, in the development and study of large neural networks. Trained as a probabilist and optimization theorist, he made foundational contributions to the theory of multi-armed bandits and convex optimization before turning his attention to deep learning. At Microsoft Research he led the team behind the Phi family of small language models and was the lead author of the widely discussed 2023 paper "Sparks of Artificial General Intelligence." In October 2024 he left Microsoft to join OpenAI, where he works on advancing AI reasoning toward artificial general intelligence. His research has accumulated more than 25,000 citations. [1][2]
Bubeck was born in France on April 16, 1985, and later took American citizenship. He trained in mathematics and probability at the École Normale Supérieure de Cachan, now part of École Normale Supérieure Paris-Saclay. He completed his PhD in 2010 at the University of Lille 1 (Lille 1 University of Science and Technology), carrying out his doctoral research within INRIA's SequeL group under the supervision of Rémi Munos and Gilles Stoltz. His dissertation, "Bandits Games and Clustering Foundations," was recognized with the Jacques Neveu prize for the best French doctoral thesis in probability and statistics, and was a runner-up for both the Gilles Kahn prize for the best French PhD in computer science and the French Association for Artificial Intelligence thesis award. [1][2]
Bubeck's early work centered on online learning and sequential decision making, and in particular on the multi-armed bandit problem, in which an agent must repeatedly choose among uncertain options while balancing exploration against exploitation. He established minimax regret rates for stochastic and adversarial bandits, studied pure-exploration variants of the problem, and, with Rémi Munos and collaborators, extended bandit theory to continuous and structured action spaces through work such as "X-Armed Bandits." He also produced an optimal algorithm for bandit convex optimization, long an open problem in the field. These results were collected in his monograph "Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems" (2012), published in Foundations and Trends in Machine Learning. [1][3]
In parallel he worked on the theory of first-order optimization. His 2015 monograph "Convex Optimization: Algorithms and Complexity," also in Foundations and Trends in Machine Learning, became a standard graduate-level reference on gradient methods, acceleration, and the complexity of convex optimization. [4]
Bubeck joined Princeton University in 2011 as an assistant professor in the Department of Operations Research and Financial Engineering, before moving to Microsoft Research in 2014. He continued to publish in learning theory and probability throughout this period, including influential work on the geometry of Markov chains, sampling, and clustering. A notable later result is the "law of robustness," a 2021 paper with Mark Sellke arguing that smoothly interpolating noisy data necessarily requires substantial overparameterization, which helped explain why very large networks tend to be more robust. The paper received an Outstanding Paper Award at NeurIPS 2021. Over his career Bubeck has received best-paper awards at COLT (2016) and STOC (2023), additional honors at NeurIPS (2018 and 2021), and an Alfred P. Sloan Research Fellowship in 2015. [1][5]
At Microsoft Research in Redmond, Bubeck rose to Vice President of Applied Research and Distinguished Scientist, leading the Machine Learning Foundations group. Beginning in 2023 his team pursued a contrarian thesis: that the capability of a language model is driven less by sheer scale than by the quality of its training data. The result was the Phi family of small language models, introduced through a sequence of papers whose titles riffed on the slogan "Textbooks Are All You Need." [6][7]
The first model, phi-1, was a 1.3-billion-parameter transformer trained on a mixture of filtered "textbook quality" web data and synthetic exercises, and it reached strong performance on Python code generation despite its small size. Successive releases broadened the approach from coding to general reasoning and knowledge, while keeping the models compact enough to run cheaply or on-device. The models were released under a permissive MIT license, and components were adopted in Microsoft products such as Copilot experiences. [6][7][9]
| Model | Release | Parameters | Notes |
|---|---|---|---|
| phi-1 | June 2023 | 1.3B | Trained on "textbook quality" and synthetic data; strong Python code generation |
| phi-1.5 | September 2023 | 1.3B | Extended the approach to common-sense reasoning and general knowledge |
| phi-2 | December 2023 | 2.7B | Matched or exceeded much larger models on several benchmarks |
| phi-3-mini | April 2024 | 3.8B | Launched with larger small, medium, and vision variants |
| phi-4 | December 2024 | 14B | Emphasis on reasoning, built on heavily curated synthetic data |
The Phi line continued to expand after Bubeck's departure, with reasoning-oriented and multimodal variants such as phi-4-mini and later phi-4 reasoning models extending the family into 2025 and 2026. The work is frequently cited as evidence that careful data curation can let small models punch far above their parameter count. [7][9]
In March 2023 Bubeck was the lead author of "Sparks of Artificial General Intelligence: Early experiments with GPT-4," a paper written with thirteen Microsoft colleagues including Eric Horvitz, Peter Lee, Yin Tat Lee, Ece Kamar, and Johannes Gehrke. Running to more than 150 pages, the paper documented qualitative experiments with an early, relatively unrestricted version of GPT-4 across mathematics, coding, vision, medicine, law, and psychology. The authors argued that the model's performance was "strikingly close to human-level" on many tasks and that GPT-4 could reasonably be viewed as an early, though still incomplete, version of an artificial general intelligence system. A frequently cited example was the model's attempt to draw a unicorn using the TikZ graphics language, offered as evidence of an emergent world model. [10]
The paper became one of the most discussed AI documents of the year, and also one of the most contested. Critics noted that it was not peer reviewed, that its authors had a commercial interest in GPT-4 through Microsoft's partnership with OpenAI, that its definition of artificial general intelligence was vague, and that experiments run on a private pre-release model were difficult for outsiders to reproduce. Supporters credited it with seriously cataloguing the surprising breadth of behavior in a frontier large language model. The title "Sparks of AGI" entered common usage as shorthand for the debate over how close such systems were to general intelligence. [10]
In October 2024, after a decade at Microsoft, Bubeck left to join OpenAI, a move widely reported amid intense competition for senior AI talent. Coverage of his departure framed the change as a desire to work directly on OpenAI's stated goal of building artificial general intelligence, building on his Microsoft experience with efficient models and reasoning. Microsoft, an OpenAI investor and partner, said it expected to continue its relationship with him through his new work. Reporting at the time indicated he would help lead research efforts at the company, though OpenAI did not publicly attach a specific corporate title, and Bubeck has described himself simply as working on AI there. [11][12]
Much of his subsequent visible work has concerned mathematical and scientific reasoning. In August 2025 Bubeck reported a striking demonstration: given an unsolved sub-problem from a convex optimization paper, the model GPT-5-Pro reasoned for roughly seventeen minutes and produced a correct proof that widened a known safe step-size bound for gradient descent by about fifty percent, improving the guaranteed range from 1/L to 1.5/L for an L-smooth objective. Bubeck, himself a convex optimization theorist, verified the argument and presented it as an early case of a general-purpose reasoning model contributing genuinely new mathematics rather than retrieving a known result. Human researchers subsequently tightened the bound further, underscoring that the model had advanced, but not closed, the question. As of 2026 he remains at OpenAI, where his research continues to focus on reasoning, mathematics, and the path toward more capable and general AI systems. [13][1]