Misha Laskin
Last reviewed
Jun 5, 2026
Sources
21 citations
Review status
Source-backed
Revision
v2 ยท 2,391 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 5, 2026
Sources
21 citations
Review status
Source-backed
Revision
v2 ยท 2,391 words
Add missing citations, update stale details, or suggest a clearer explanation.
Misha Laskin (also published as Michael Laskin) is an AI researcher and entrepreneur best known for his work on reinforcement learning and for co-founding Reflection AI, where he serves as chief executive officer. Trained as a theoretical physicist, he moved into machine learning in the late 2010s and went on to work at Google DeepMind, where he contributed to reinforcement learning and reward modeling for the Gemini models. In 2024 he left DeepMind to start Reflection AI, a company that positions itself as a Western open frontier laboratory and that raised one of the largest startup funding rounds of 2025.[1][2][3]
Laskin's career spans two distinct fields. He earned a doctorate in theoretical physics before pivoting to artificial intelligence research, a transition he has described as unconventional given his lack of formal machine learning credentials at the time.[4][5] As a researcher he is associated with a body of work on representation learning, unsupervised reinforcement learning, and in-context learning, including the widely cited CURL method developed during his postdoctoral years at UC Berkeley.[6][7] At Google DeepMind he worked on reinforcement learning for large models and led reward model training for the Gemini project's reinforcement learning from human feedback (RLHF) pipeline.[2][5] He co-founded Reflection AI with fellow former DeepMind researcher Ioannis Antonoglou in 2024.[1][2]
Laskin has described himself as Russian-Israeli, having moved to the United States as a teenager, and has said that as a teenager he became interested in two subjects in particular, physics and literature.[15] He pursued both at Yale University, where he completed his undergraduate degree in physics and double majored in literature.[4][15] He then earned a PhD at the University of Chicago, where his research was in theoretical many-body quantum physics.[4][8][15]
After completing his doctorate, Laskin moved into entrepreneurship rather than continuing directly in physics. He founded a startup focused on inventory prediction systems for retailers; he has said the venture gradually converged on a revenue-generating consulting business without a clear product around it.[15] His personal biography describes this period as a Y Combinator-backed startup that preceded his return to research.[4] He has framed this stretch as a "random walk" through startup ideas before he found his way into AI research.[15]
The release of AlphaGo was the turning point that redirected his career. Laskin has said that reading the AlphaGo work changed something in him and gave him a glimpse of what it might be like to live alongside highly capable AI systems, after which he set aside his other projects to study the fundamentals of deep learning and reinforcement learning.[5][15] He has credited Berkeley professor Pieter Abbeel with giving him an entry point into the field when he had limited conventional qualifications, and has said that being an outsider gave him a useful "clean slate" perspective on the research.[5]
Laskin completed his undergraduate studies at Yale University, studying physics and literature, and then earned a PhD in theoretical many-body quantum physics at the University of Chicago.[4][8] He has framed the move from physics into AI as a deliberate career change, crediting Berkeley professor Pieter Abbeel with giving him an entry point into the field when he had limited conventional qualifications.[5] After his PhD he founded a Y Combinator-backed startup before returning to research.[4]
Laskin's most prominent academic contribution came during his postdoctoral work at UC Berkeley, where he was advised by Abbeel and focused on unsupervised learning and deep reinforcement learning.[6][9] In 2020 he, Aravind Srinivas, and Abbeel published CURL (Contrastive Unsupervised Representations for Reinforcement Learning), an algorithm that uses contrastive learning to extract high-level features from raw pixels and then performs off-policy control on top of those features. CURL was presented at ICML 2020 and was, at the time, the first image-based method to nearly match the sample efficiency of approaches that rely on state-based features on the DeepMind Control Suite and Atari benchmarks.[6][7] By 2026 the paper had accumulated well over one thousand citations, making it one of his most influential works.[16]
His research output reflects a sustained interest in making reinforcement learning more general and more data efficient. During the same period at Berkeley he co-authored RAD (Reinforcement Learning with Augmented Data), presented at NeurIPS 2020, which showed that applying simple data augmentations to input observations, an idea borrowed from computer vision, could improve both the data efficiency and the generalization of reinforcement learning agents without any auxiliary loss.[17] He was also a co-author of the Decision Transformer paper, published at NeurIPS 2021, which reframed reinforcement learning as a sequence modeling problem and influenced a subsequent line of work on treating control as conditional sequence prediction.[18][5] Selected publications include the following.
| Year | Work | Venue | Topic |
|---|---|---|---|
| 2020 | CURL: Contrastive Unsupervised Representations for Reinforcement Learning[7] | ICML 2020 | Pixel-based RL via contrastive representation learning |
| 2020 | Reinforcement Learning with Augmented Data (RAD)[17] | NeurIPS 2020 | Data augmentation for sample-efficient RL |
| 2021 | Decision Transformer (co-author)[18] | NeurIPS 2021 | RL as conditional sequence modeling |
| 2021 | Decoupling Representation Learning from Reinforcement Learning (ATC)[10] | ICML 2021 | Separating representation learning from RL control |
| 2022 | In-context Reinforcement Learning with Algorithm Distillation[11] | ICLR 2022 | Distilling RL algorithms into sequence models for in-context learning |
Laskin's research is organized around a recurring theme: making reinforcement learning agents learn more from less, whether by improving the representations they build from raw inputs or by reusing learning experience more efficiently. CURL and RAD, both from 2020, attacked the sample-efficiency problem of pixel-based control from two complementary angles, with CURL adding an auxiliary contrastive objective and RAD relying on data augmentation alone.[7][17] CURL has been cited more than 1,200 times and RAD several hundred times, and together they became standard reference points for self-supervised and augmentation-based methods in deep reinforcement learning.[16][17]
His later work shifted toward sequence models and in-context learning. The Decision Transformer (2021) showed that an autoregressive transformer conditioned on desired return could produce competitive policies, recasting offline reinforcement learning as supervised sequence prediction.[18] Building on this direction, the 2022 Algorithm Distillation paper demonstrated that a causal transformer trained on the learning histories of a source reinforcement learning algorithm could improve its own policy entirely in context, without any updates to its network weights, effectively learning to reinforcement learn.[11] This line of research foreshadowed his later interest in combining the breadth of large language models with the depth and capability that reinforcement learning can provide.[5]
After his time at Berkeley, Laskin joined Google DeepMind as a research scientist, where he described his focus as developing generally intelligent agents and studying how reinforcement learning can unlock new capabilities in large language and multimodal models.[12] He joined in 2021 as part of a Toronto-based general agents team led by Vlad Mnih, where much of his early DeepMind work centered on in-context learning.[5] His most cited result from this period is the 2022 paper on Algorithm Distillation, which showed that a causal transformer trained on the learning histories of a source reinforcement learning algorithm could improve its own policy entirely in context, without updating its network weights.[11]
Laskin has said that the release of ChatGPT in late 2022 reshaped his thinking, convincing him that large language models had achieved a breadth of capability that general-agent research had not, and that the open problem was depth rather than generality.[5] He subsequently moved onto the Gemini project, where he worked on reinforcement learning and led reward model training, which he has characterized as a central component of the RLHF process used to align and post-train large language models.[2][5] He has said this experience with post-training and alignment shaped the technical direction of his later company. Laskin worked closely at DeepMind with Ioannis Antonoglou, who had been one of the creators of AlphaGo and who led post-training for Gemini, and the two collaborated on both Gemini 1.0 and Gemini 1.5 before going on to found a company together.[2][5]
Laskin co-founded Reflection AI with Antonoglou in early 2024, with sources placing the company's start in late February or March of that year.[1][2] Antonoglou, who serves as the company's chief technology officer, had joined DeepMind in 2012 as one of its earliest engineers and had worked on foundational systems including DQN, AlphaGo, AlphaZero, and MuZero before leading RLHF for Gemini.[2][19] The pair framed the venture around what they call the "depth problem" in AI agents: while large labs had concentrated on breadth, Laskin argued that building agents able to plan and reliably execute long, complex, multi-step tasks remained an open challenge.[5] Their stated approach drew on lessons from AlphaGo, combining the generality of language models with the learning and search techniques of reinforcement learning to produce more capable and dependable autonomous agents.[5] The company operates across offices in New York, San Francisco, and London.[19]
The company emerged from stealth in March 2025, raising roughly $130 million at a valuation of about $545 million.[3][13] Its first publicly announced product was Asimov, launched in July 2025 and described as a code research or code comprehension agent for large enterprise codebases. Rather than focusing on generating new code, Asimov was built to index a codebase along with surrounding context such as documentation, messages, and engineering discussions, and to answer questions about how complex software works. It was designed as a collection of cooperating agents, with smaller agents retrieving relevant data and a larger reasoning agent synthesizing answers, that could run inside a customer's own cloud environment so that data remained under the customer's control.[13][14] In an internal evaluation reported at launch, developers were said to prefer Asimov's answers for 82 percent of their prompts, compared with 63 percent for Anthropic's Claude Sonnet 4, though the comparison did not include several other coding tools.[14]
Reflection AI subsequently shifted its emphasis from autonomous coding toward building a frontier open base model. Antonoglou explained the pivot by arguing that the Western ecosystem lacked a powerful open base model that could itself serve as a foundation for large-scale reinforcement learning.[13]
In October 2025 Reflection AI raised $2 billion at an $8 billion valuation, a roughly 15-fold increase from its valuation seven months earlier.[2][3] The round drew a broad set of investors, including Nvidia, former Google chief executive Eric Schmidt, Zoom founder Eric Yuan, the private equity firm 1789 Capital, Citi, DST Global, B Capital, GIC, CRV, and existing backers Lightspeed and Sequoia, among others.[2][3][20] Reporting indicated the company employed roughly 60 researchers and engineers at that time and had secured a dedicated compute cluster.[3]
The company explicitly positions itself as both an open-source alternative to closed frontier labs such as OpenAI and Anthropic and a Western counterpart to Chinese AI firms such as DeepSeek.[2][3] Laskin said Reflection AI planned to release a frontier language model, trained on tens of trillions of tokens, and to make model weights available for public use while largely keeping its datasets and full training pipelines proprietary. The company has said it expects revenue to come from enterprises and from governments building sovereign AI systems.[3][20] It has described building a large-scale LLM and reinforcement learning platform capable of training mixture-of-experts models at frontier scale.[2] Coverage in early 2026 noted that, despite this open-weight messaging, the company had not yet released a public model or dataset, with its public model and dataset repositories empty, and reported that it was pursuing a further round of roughly $2 billion at a valuation near or above $20 billion.[13][21]
As of 2026, Laskin is the co-founder and chief executive officer of Reflection AI, leading the company's effort to build an open frontier model and the agentic systems built on top of it.[2][3] He has continued to speak publicly about the company's strategy and his broader views on AI through interviews and podcasts, including appearances on Sequoia Capital's Training Data and the Manifold podcast.[5][8] Industry coverage in early 2026 framed the company's roughly $20 billion valuation as contingent on its delivering a publicly released frontier model, a milestone it had not yet reached.[21]