Geoffrey Irving
Last reviewed
Jun 8, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,690 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 8, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,690 words
Add missing citations, update stale details, or suggest a clearer explanation.
Geoffrey Irving is a computer scientist and artificial intelligence safety researcher who serves as chief scientist of the UK AI Security Institute (AISI), the British government body that evaluates frontier AI systems for national security and public safety risks. He is best known in AI safety as a co-author of the 2018 paper "AI safety via debate," an influential proposal for supervising AI systems on tasks that are too complex for humans to check directly, and for leading technical safety teams at OpenAI and Google DeepMind before moving into government. Earlier in his career he was a computational physicist whose simulation software earned him screen credits on several major animated films. [1][2]
In May 2026 Irving announced that he would leave AISI to return to the United States and start a new nonprofit alignment research organisation, a transition described in more detail below. [6][7]
| Period | Role | Organisation |
|---|---|---|
| 2003 | B.Sc., Mathematics and Computer Science | California Institute of Technology |
| 2007 | Ph.D., Computer Science | Stanford University |
| 2000s | Computational physics and film visual effects | Pixar, Weta Digital |
| Late 2000s to early 2010s | Computational physics; co-founded Eddy Systems | D. E. Shaw Research, Otherlab |
| c. 2014 to 2017 | Research engineer, theorem proving and TensorFlow | Google Brain |
| c. 2017 to 2019 | Lead, Reflection Team | OpenAI |
| 2019 to December 2023 | Lead, Scalable Alignment Team | Google DeepMind |
| April 2024 to present | Research director, then chief scientist | UK AI Security Institute |
Irving studied mathematics and computer science at the California Institute of Technology, graduating with a B.Sc. in 2003. He then moved to Stanford University, where he completed a Ph.D. in computer science in 2007 under the supervision of Ron Fedkiw, a leading researcher in physically based simulation of natural phenomena. [2] Irving's doctoral work focused on the numerical simulation of deformable solids and fluids, and as a graduate student he co-authored "Invertible Finite Elements for Robust Simulation of Large Deformation" (2004), a widely cited method for stably simulating objects that are stretched, crushed, or inverted. [2][10] This grounding in computational physics and geometry, rather than in machine learning, shaped the early part of his career.
Before turning to artificial intelligence, Irving worked for roughly a decade in computational physics and geometry, much of it connected to computer graphics and visual effects. Through Fedkiw's PhysBAM simulation framework, his work on deformable-body and fluid simulation contributed to feature films, and he holds screen credits on the animated films Ratatouille (2007), WALL-E (2008), and Up (2009) at Pixar, and on The Adventures of Tintin (2011) at the New Zealand studio Weta Digital. [2][11]
Irving also worked at D. E. Shaw Research, the New York firm known for special-purpose supercomputers for molecular dynamics, and at Otherlab, a San Francisco research and engineering lab. [2] During this period he co-founded Eddy Systems, a startup that built an automatic code-correction tool for the Java programming language. [2] A recurring thread in his early work was large-scale computation applied to hard combinatorial problems: he is known among games researchers for strongly solving Pentago, a two-player board game, which he has described as the largest "divergent" game solved by brute-force computation by about two orders of magnitude. [2]
Around 2014 Irving joined Google Brain, the deep-learning research group at Google. There he was a co-author of TensorFlow, the open-source machine-learning framework released in 2015, whose foundational papers are among the most cited in modern computing. [2][10] He also co-led the N2Formal team, which applied neural networks to automated theorem proving, contributing to research on using deep learning to guide proof search and select relevant premises in formal mathematics. [2] This work combined his background in large-scale computation with the emerging capabilities of deep learning, and it foreshadowed his later interest in machines that can reason through long, verifiable arguments.
Irving moved to OpenAI around 2017, where he led the Reflection Team, a group focused on the safety of advanced AI and the alignment of language models. [4][5] His most influential contribution from this period is the 2018 paper "AI safety via debate," written with Paul Christiano and Dario Amodei. [4] The paper proposes training agents through self-play in a zero-sum debate game: given a question, two AI systems take turns making short statements, and a human judge decides which side argued more honestly and usefully. The motivation is the problem of scalable oversight, that direct human judgement breaks down when a task is too complex for a person to evaluate. By analogy to computational complexity theory, the authors argue that debate with optimal play lets a limited (polynomial-time) judge correctly answer any question in the complexity class PSPACE, whereas direct judging reaches only the smaller class NP. The idea sits within a broader research agenda, sometimes called factored cognition, that aims to decompose hard problems into pieces a human can supervise. [4]
While at OpenAI, Irving also co-authored "Fine-Tuning Language Models from Human Preferences" (2019), an early demonstration of reinforcement learning from human feedback applied to language models, a technique that later became central to training systems such as ChatGPT. [9] In the same year he co-wrote "AI Safety Needs Social Scientists" with Amanda Askell, arguing that experiments like debate depend on understanding human judgement and therefore require expertise from the social sciences as well as from computer science. [2] He discussed the debate approach publicly, including in a 2019 episode of the Future of Life Institute's AI alignment podcast. [12]
In 2019 Irving joined Google DeepMind in London, where he led the Scalable Alignment Team, the lab's group working on technical AI alignment for advanced, AGI-oriented systems. [4][5] The team pursued methods for letting humans oversee models more capable than themselves, including empirical studies of debate, red-teaming of language models to surface harmful behaviour, and approaches to making model reasoning easier to check. Irving was also among the many contributors credited on DeepMind's Gemini family of multimodal models. [10] He left DeepMind in December 2023 to move into the public sector. [7]
The body Irving joined was created as the UK AI Safety Institute, announced in November 2023 around the AI Safety Summit held at Bletchley Park. It was established as a directorate within the Department for Science, Innovation and Technology to test advanced AI models for dangerous capabilities and to advise the government on catastrophic and security risks. [1][8]
Irving was announced as the institute's research director in February 2024 and started that April. [5] On his appointment he stressed coordination across sectors, saying that "as we build more powerful AI systems, it is essential for us to be able to coordinate between the private sector, government and civil society." [5] As the organisation grew and Yarin Gal joined as director of research, Irving's title became chief scientist, the role he holds as of 2026. [1] In a blog post titled "Why I joined AISI," he explained that he believed progress on safety was too slow relative to the pace of AI advancement, and that the policy-adjacent technical work a government institute could do was even more understaffed than safety work inside companies, so the impact he could have at AISI was larger than in a corporate lab. [3] He has summarised his stance on his own website with the line that "alignment will be solved eventually, but we don't know how to do it yet." [2]
At AISI Irving has concentrated on what he calls safety cases for loss of control: structured arguments that map out, across a range of techniques, the conditions under which a powerful AI system can be considered safe to deploy. He has also advised on the institute's model evaluations and on international coordination and diplomacy. [3] In February 2025 the body was renamed the AI Security Institute, keeping the AISI acronym, as the government reoriented it toward serious security risks such as cyber-attacks, fraud, and other crime; the change was announced by technology secretary Peter Kyle at the Munich Security Conference. [8]
On May 15, 2026, Irving announced on the social platform X that he would leave AISI. He wrote that "for family reasons, I will be leaving AISI soon to move back to the Bay Area," and that "I will be starting a new nonprofit alignment research org (more to come)." [6] In an accompanying thread he reflected on his tenure, calling AISI "my favourite job to date" and saying he was leaving with mixed feelings, while reaffirming the reasons he had joined: that safety progress is slow, that coordination is under-resourced, and that governments have a key role to play in coordination. [6][7] As of the announcement, the name and structure of the new organisation had not been disclosed, and AISI continued to list Irving as its chief scientist. [1][7] His move would return him to the San Francisco Bay Area, the centre of the AI alignment community in which he began his safety research and which includes organisations such as METR, founded by his former OpenAI colleague Beth Barnes. [7]