Ioannis Antonoglou

Google DeepMind People Reinforcement Learning

11 min read

Updated Jun 9, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 9, 2026

Fact-checked

In review queue

Sources

20 citations

Revision

v3 · 2,141 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Ioannis Antonoglou is a Greek artificial intelligence researcher known as a co-creator of several of the landmark reinforcement learning systems built at Google DeepMind, including the Atari-playing Deep Q-Network, AlphaGo, AlphaZero, and MuZero. He spent more than a decade at DeepMind, where he was a founding-era engineer and later a senior researcher, and contributed to the reinforcement learning behind the company's Gemini models. In 2024 he co-founded the frontier AI startup Reflection AI, where he serves as co-founder, president, and chief technology officer. ^[1]^[2]^[3]

Overview

Antonoglou's work sits at the intersection of deep learning and reinforcement learning, the combination that produced the first agents to reach or exceed human performance on classic video games and on the board games of Go, chess, and shogi. He is listed as a co-author on the sequence of DeepMind papers that defined modern deep reinforcement learning, from the 2013 deep Q-network through MuZero in 2020, several of which appeared in Nature and Science. ^[4]^[5]^[6]^[7]^[8]

As of 2026 he leads the technical direction of Reflection AI, a company positioned as an American open-weight frontier lab. Reflection raised 2 billion US dollars in October 2025 at an 8 billion dollar valuation, and aims to build large-scale, reinforcement-learning-driven agent models while keeping model weights openly available. ^[1]^[2]

His combined research record is highly cited. By 2026 his Google Scholar profile listed more than 140,000 total citations, an h-index of 37, and an i10-index of 39, a reflection of how widely the deep reinforcement learning papers he co-authored have been built upon across the field. ^[9]

Education

Antonoglou is from Greece. He was born and raised in Thessaloniki. ^[12] He earned an engineer's degree in electrical and computer engineering from the Aristotle University of Thessaloniki in 2011, and a master's degree in artificial intelligence and machine learning from the University of Edinburgh in 2012. ^[3]^[11] He has also been associated with University College London (UCL) through doctoral-level research, an affiliation that appears alongside DeepMind and Reflection AI on his academic profile. ^[9]^[10] His doctoral work, in artificial intelligence at UCL, was carried out over roughly 2017 to 2023 and was pursued in parallel with his research career, an arrangement under which his DeepMind colleague David Silver acted as a mentor and supervisor. ^[12]^[9]

DeepMind career and the reinforcement learning milestones

Antonoglou joined DeepMind in its early days, when the London lab was a small team, and remained for more than eleven years, rising to a senior research role. ^[10]^[2] Multiple accounts describe him as employee number 25 and the sixth member of the research team when he arrived in late 2012, at a time when the company numbered only around 20 to 25 people in total. ^[12]^[13] Over that period he was a recurring co-author on the projects that made the lab's reputation in reinforcement learning, and colleagues such as Volodymyr Mnih, the lead author of the original Atari work, have publicly credited his attention to detail and engineering rigor. ^[12]

His first widely cited contribution was the 2013 preprint "Playing Atari with Deep Reinforcement Learning," which introduced the deep Q-network (DQN) and showed that a single neural network could learn to play many Atari 2600 games directly from pixels. ^[4] The follow-up, "Human-level control through deep reinforcement learning," was published in Nature in 2015 and demonstrated human-level scores across dozens of Atari titles. ^[5] That Nature paper became the most cited work on his record, with more than 42,000 citations by 2026. ^[9]

He then worked on the AlphaGo line of systems, which combined deep learning with Monte Carlo tree search. He is a co-author of the 2016 Nature paper "Mastering the game of Go with deep neural networks and tree search," describing the program that defeated a human Go champion, and of the 2017 successor on learning Go without human game data. ^[6]^[11] On the AlphaGo project his engineering responsibilities included optimizing the program's neural networks to run efficiently on GPUs and TPUs, work that helped make the system fast enough to play competitively. ^[12] The 2016 Go paper has itself been cited more than 25,000 times. ^[9] The work generalized into AlphaZero, reported in Science in 2018, a single algorithm that reached superhuman play in chess, shogi, and Go purely through self-play, again with Antonoglou among the authors alongside David Silver and others. ^[7] In 2020 he was a co-author of the Nature paper on MuZero, which extended the approach by learning a model of the environment's dynamics, so that the same method could master Atari, Go, chess, and shogi without being told the rules in advance. ^[8]

Antonoglou later contributed to DeepMind's large-model efforts, and he is listed among the authors of the 2023 technical report on the Gemini family of multimodal models; his reinforcement learning experience informed the reward-modeling and learning-from-feedback components of that work. ^[9]^[2] In the period before he left, he led the reinforcement learning from human feedback (RLHF) effort for Gemini, the post-training stage used to align the model from human preference data. ^[12]^[13] His co-authorship also extends to other DeepMind research, and his broader citation record reflects the influence of the deep reinforcement learning papers above. ^[9]

Selected systems and Antonoglou's role

System	Year	Venue	Antonoglou's role
Deep Q-Network (Atari)	2013 / 2015	arXiv; Nature	Co-author of the DQN preprint and the Nature paper
AlphaGo	2016	Nature	Co-author; co-creator of the Go-playing system
AlphaGo Zero	2017	Nature	Co-author
AlphaZero	2018	Science	Co-author of the general self-play algorithm
MuZero	2020	Nature	Co-author of the learned-model planning method
Gemini	2023	arXiv (technical report)	Contributing author; led RLHF

Citation impact

The deep reinforcement learning papers Antonoglou co-authored are among the most cited in modern machine learning. The figures below are drawn from his Google Scholar profile as of 2026. ^[9]

Paper	Year	Approximate citations
Human-level control through deep reinforcement learning	2015	42,000+
Mastering the game of Go with deep neural networks and tree search	2016	25,000+
Playing Atari with Deep Reinforcement Learning	2013	20,000+
Mastering the game of Go without human knowledge	2017	14,000+
Gemini: a family of highly capable multimodal models	2023	9,000+

Founding Reflection AI

In March 2024 Antonoglou left DeepMind to co-found Reflection AI with Misha Laskin, another former DeepMind researcher who had led reward-modeling work for the Gemini project. ^[1]^[2] Laskin, who earned a doctorate in theoretical physics at the University of Chicago and did postdoctoral research in Pieter Abbeel's laboratory at UC Berkeley, became chief executive, and Antonoglou took the roles of president and chief technology officer. ^[1]^[3]^[16] The company is based in New York and also operates offices in San Francisco and London. ^[2]^[12]

Reflection's initial focus was autonomous coding agents, applying large-scale reinforcement learning to software engineering, a domain the founders described as both demanding and verifiable. ^[1]^[2] The pair argued that autonomous coding is effectively "AGI complete," a problem whose solution would carry over to general autonomous systems. ^[14] The startup later broadened its ambition toward general autonomous agents and frontier-scale language models.

The company emerged from stealth in March 2025, having raised about 130 million US dollars across a 25 million dollar seed round led by Sequoia Capital and CRV and a 105 million dollar Series A co-led by Lightspeed Venture Partners and CRV, at a valuation of roughly 545 million dollars. ^[15]^[17]

Asimov

Reflection AI's first publicly announced product was Asimov, unveiled on July 16, 2025, and described as a research agent for code comprehension rather than code generation. ^[15]^[18] The founders argued that engineers spend more time understanding and designing software than writing it, so Asimov was built to ingest an entire codebase together with surrounding context such as documentation, issue threads, and chat history, and to build a persistent memory of how a system works. ^[15]^[18] Technically it used a multi-agent design in which many small long-context retriever agents pull relevant information out of a large codebase and a single larger short-context reasoning agent combines that information into an answer, and it was designed to run inside a customer's own cloud so that proprietary code stayed under the customer's control. ^[15]^[18] In an internal evaluation reported at launch, developers were said to prefer Asimov's answers for about 82 percent of prompts, compared with about 63 percent for Anthropic's Claude Sonnet 4. ^[18]

Current role and the company's direction

Antonoglou directs Reflection AI's technical strategy. The company has framed itself as an American open frontier lab, an alternative both to closed labs such as OpenAI and Anthropic and to Chinese open-model groups like DeepSeek. ^[1] Its plan centers on releasing open-weight models, including mixture-of-experts systems at frontier scale, so that enterprises, researchers, and governments can run and adapt the models with control over their own AI stack. ^[1] Antonoglou has explained the company's pivot toward an open base model by arguing that the Western ecosystem lacked a powerful open foundation that could itself serve as a starting point for large-scale reinforcement learning. ^[13]

In October 2025 Reflection announced a 2 billion dollar funding round at an 8 billion dollar valuation, a roughly fifteenfold increase over its earlier valuation, with investors including Nvidia, Sequoia, Lightspeed, DST Global, B Capital, GIC, and CRV, as well as Eric Yuan and Eric Schmidt. ^[1]^[2] At the time the team numbered around 60 people, mostly AI researchers and engineers, and the company said it intended to train a frontier language model on tens of trillions of tokens. ^[1]^[2] As of early 2026 the company had not yet released its frontier open-weight model publicly, and Antonoglou said it planned to release what he hoped would be the most powerful open-weight model in the world later that year. ^[19]

Recognition

Antonoglou has been recognized within the Greek technology and entrepreneurship community, including by Endeavor Greece, which features him among its profiled entrepreneurs and selected him as a Catalyst investee in 2025. ^[12] He is a frequent speaker on open foundation models and reinforcement learning, with appearances at venues such as SXSW London 2025 and on industry podcasts including Sequoia Capital's Training Data. ^[2]^[20] Much of the public recognition of his work is collective, attached to the DeepMind teams behind AlphaGo and AlphaZero, whose achievements were widely reported as milestones in artificial intelligence.

References

Maxwell Zeff, "Reflection AI raises $2B to be America's open frontier AI lab, challenging DeepSeek," TechCrunch, October 9, 2025. ↩
"ReflectionAI founder Ioannis Antonoglou: From AlphaGo to AGI," Sequoia Capital, Training Data podcast. ↩
"Ioannis Antonoglou," Crunchbase person profile. ↩
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, "Playing Atari with Deep Reinforcement Learning," arXiv:1312.5602, 2013. ↩
Volodymyr Mnih et al., "Human-level control through deep reinforcement learning," Nature 518, 529-533, 2015. ↩
David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, et al., "Mastering the game of Go with deep neural networks and tree search," Nature 529, 484-489, 2016. ↩
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, et al., "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play," Science 362, 1140-1144, 2018; preprint arXiv:1712.01815. ↩
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver, "Mastering Atari, Go, chess and shogi by planning with a learned model," Nature 588, 604-609, 2020. ↩
"Ioannis Antonoglou," Google Scholar profile. ↩
"Ioannis Antonoglou: From NTUA to DeepMind and the Next Generation of AI Agents," Endeavor Greece podcast. ↩
"Ioannis Antonoglou," Chessprogramming wiki. ↩
"Ioannis Antonoglou," Endeavor Greece entrepreneur profile. Retrieved June 6, 2026. ↩
"Reflection AI: We Are the Only Ones Who Would Build It," Turing Post. Retrieved June 6, 2026. ↩
"Partnering with Reflection: Toward Superintelligence, with Autonomous Coding," Sequoia Capital. Retrieved June 6, 2026. ↩
"Reflection AI Launches Asimov Code Comprehension Agent," Sequoia Capital. Retrieved June 6, 2026. ↩
"Reflection AI's Misha Laskin on the AlphaGo Moment for LLMs," Sequoia Capital, Training Data podcast. Retrieved June 6, 2026. ↩
Mike Wheatley, "Superintelligence startup Reflection AI launches with $130M in funding," SiliconANGLE, March 7, 2025. ↩
Mike Wheatley, "Reflection AI's autonomous coding agent Asimov learns from more than just code," SiliconANGLE, July 16, 2025. ↩
"Reflection AI Faces Critical March 2026 Test as $20 Billion Valuation Hinges on Public Model Release," AInvest. Retrieved June 6, 2026. ↩
"Ioannis Antonoglou," SXSW London 2025 Speakers. Retrieved June 6, 2026. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Misha Laskin