Geoffrey Irving

AI Safety People

11 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v2 · 2,217 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Geoffrey Irving is a computer scientist and artificial intelligence safety researcher who, until 2026, served as chief scientist of the UK AI Security Institute (AISI), the British government body that evaluates frontier AI systems for national security and public safety risks. In June 2026 he launched Sequent Research, a new nonprofit organisation focused on aligning superintelligence, which he co-founded with mathematician Daniel Murfet. ^[1]^[13]^[14] He is best known in AI safety as lead author of the 2018 paper "AI safety via debate," an influential proposal for supervising AI systems on tasks too complex for humans to check directly, and for leading technical safety teams at OpenAI and Google DeepMind before moving into government. Earlier in his career he was a computational physicist whose simulation software earned him screen credits on several major animated films. ^[1]^[2]^[4]

Irving holds a Ph.D. from Stanford University, co-authored the TensorFlow machine learning framework at Google Brain, and over roughly a decade in AI safety has led the Reflection Team at OpenAI, the Scalable Alignment Team at Google DeepMind, and the research and science functions at AISI. ^[2]^[4]^[5] His departure from AISI to found Sequent and his earlier roles are described in more detail below. ^[6]^[7]^[13]

Period	Role	Organisation
2003	B.Sc., Mathematics and Computer Science	California Institute of Technology
2007	Ph.D., Computer Science	Stanford University
2000s	Computational physics and film visual effects	Pixar, Weta Digital
Late 2000s to early 2010s	Computational physics; co-founded Eddy Systems	D. E. Shaw Research, Otherlab
c. 2014 to 2017	Research engineer, theorem proving and TensorFlow	Google Brain
c. 2017 to 2019	Lead, Reflection Team	OpenAI
2019 to December 2023	Lead, Scalable Alignment Team	Google DeepMind
April 2024 to 2026	Research director, then chief scientist	UK AI Security Institute
June 2026 to present	Co-founder	Sequent Research

Who is Geoffrey Irving?

Geoffrey Irving is an AI safety researcher whose career spans computational physics, deep learning infrastructure, technical alignment research, and government AI policy. He co-authored TensorFlow at Google Brain, then moved into AI safety, leading dedicated safety and alignment teams at OpenAI (the Reflection Team) and Google DeepMind (the Scalable Alignment Team) before joining the UK government's AISI as research director in 2024 and rising to chief scientist. ^[2]^[4]^[5] In June 2026 he left AISI to start Sequent Research, a Bay Area nonprofit aiming to align superintelligent AI. ^[13]^[14] His signature technical contribution is the 2018 paper "AI safety via debate," written with Paul Christiano and Dario Amodei, which proposes using adversarial debate between AI systems as a form of scalable oversight. ^[4]

Education

Irving studied mathematics and computer science at the California Institute of Technology, graduating with a B.Sc. in 2003. He then moved to Stanford University, where he completed a Ph.D. in computer science in 2007 under the supervision of Ron Fedkiw, a leading researcher in physically based simulation of natural phenomena. ^[2] Irving's doctoral work focused on the numerical simulation of deformable solids and fluids, and as a graduate student he co-authored "Invertible Finite Elements for Robust Simulation of Large Deformation" (2004), a widely cited method for stably simulating objects that are stretched, crushed, or inverted. ^[2]^[10] This grounding in computational physics and geometry, rather than in machine learning, shaped the early part of his career.

What did Geoffrey Irving do early in his career?

Before turning to artificial intelligence, Irving worked for roughly a decade in computational physics and geometry, much of it connected to computer graphics and visual effects. Through Fedkiw's PhysBAM simulation framework, his work on deformable-body and fluid simulation contributed to feature films, and he holds screen credits on the animated films Ratatouille (2007), WALL-E (2008), and Up (2009) at Pixar, and on The Adventures of Tintin (2011) at the New Zealand studio Weta Digital. ^[2]^[11]

Irving also worked at D. E. Shaw Research, the New York firm known for special-purpose supercomputers for molecular dynamics, and at Otherlab, a San Francisco research and engineering lab. ^[2] During this period he co-founded Eddy Systems, a startup that built an automatic code-correction tool for the Java programming language. ^[2] A recurring thread in his early work was large-scale computation applied to hard combinatorial problems: he is known among games researchers for strongly solving Pentago, a two-player board game, which he has described as the largest "divergent" game solved by brute-force computation by about two orders of magnitude. ^[2]

Machine learning and AI safety research

Google Brain

Around 2014 Irving joined Google Brain, the deep-learning research group at Google. There he was a co-author of TensorFlow, the open-source machine-learning framework released in 2015, whose foundational papers are among the most cited in modern computing. ^[2]^[10] He also co-led the N2Formal team, which applied neural networks to automated theorem proving, contributing to research on using deep learning to guide proof search and select relevant premises in formal mathematics. ^[2] This work combined his background in large-scale computation with the emerging capabilities of deep learning, and it foreshadowed his later interest in machines that can reason through long, verifiable arguments.

What is AI safety via debate?

Irving moved to OpenAI around 2017, where he led the Reflection Team, a group focused on the safety of advanced AI and the alignment of language models. ^[4]^[5] His most influential contribution from this period is the 2018 paper "AI safety via debate," written with Paul Christiano and Dario Amodei. ^[4] The paper proposes training agents through self-play in a zero-sum debate game: given a question, two AI systems take turns making short statements, and a human judge decides which side argued more honestly and usefully. The motivation is the problem of scalable oversight, that direct human judgement breaks down when a task is too complex for a person to evaluate. By analogy to computational complexity theory, the authors argue that debate with optimal play lets a limited (polynomial-time) judge correctly answer any question in the complexity class PSPACE, whereas direct judging reaches only the smaller class NP. ^[4] A core intuition behind the method is that, in a well-designed debate, lying is harder than refuting a lie, so the equilibrium favours truthful argument. ^[4] The idea sits within a broader research agenda, sometimes called factored cognition, that aims to decompose hard problems into pieces a human can supervise. ^[4]

While at OpenAI, Irving also co-authored "Fine-Tuning Language Models from Human Preferences" (2019), an early demonstration of reinforcement learning from human feedback applied to language models, a technique that later became central to training systems such as ChatGPT. ^[9] In the same year he co-wrote "AI Safety Needs Social Scientists" with Amanda Askell, arguing that experiments like debate depend on understanding human judgement and therefore require expertise from the social sciences as well as from computer science. ^[2] He discussed the debate approach publicly, including in a 2019 episode of the Future of Life Institute's AI alignment podcast. ^[12]

Google DeepMind

In 2019 Irving joined Google DeepMind in London, where he led the Scalable Alignment Team, the lab's group working on technical AI alignment for advanced, AGI-oriented systems. ^[4]^[5] The team pursued methods for letting humans oversee models more capable than themselves, including empirical studies of debate, red-teaming of language models to surface harmful behaviour, and approaches to making model reasoning easier to check. Irving was also among the many contributors credited on DeepMind's Gemini family of multimodal models. ^[10] He left DeepMind in December 2023 to move into the public sector. ^[7]

What is the UK AI Security Institute?

The body Irving joined was created as the UK AI Safety Institute, announced in November 2023 around the AI Safety Summit held at Bletchley Park. It was established as a directorate within the Department for Science, Innovation and Technology to test advanced AI models for dangerous capabilities and to advise the government on catastrophic and security risks. ^[1]^[8] In February 2025 the body was renamed the AI Security Institute, keeping the AISI acronym, as the government reoriented it toward serious security risks such as cyber-attacks, fraud, and other crime; the change was announced by technology secretary Peter Kyle at the Munich Security Conference. ^[8]

Irving was announced as the institute's research director in February 2024 and started that April. ^[5] On his appointment he stressed coordination across sectors, saying that "as we build more powerful AI systems, it is essential for us to be able to coordinate between the private sector, government and civil society." ^[5] As the organisation grew and Yarin Gal joined as director of research, Irving's title became chief scientist, the role he held until 2026. ^[1] In a blog post titled "Why I joined AISI," he explained that he believed progress on safety was too slow relative to the pace of AI advancement, and that the policy-adjacent technical work a government institute could do was even more understaffed than safety work inside companies, so the impact he could have at AISI was larger than in a corporate lab. ^[3] He has summarised his stance on his own website with the line that "alignment will be solved eventually, but we don't know how to do it yet." ^[2]

At AISI Irving concentrated on what he calls safety cases for loss of control: structured arguments that map out, across a range of techniques, the conditions under which a powerful AI system can be considered safe to deploy. He also advised on the institute's model evaluations and on international coordination and diplomacy. ^[3]

Why did Geoffrey Irving leave AISI?

On May 15, 2026, Irving announced on the social platform X that he would leave AISI. He wrote that "for family reasons, I will be leaving AISI soon to move back to the Bay Area," and that "I will be starting a new nonprofit alignment research org (more to come)." ^[6] In an accompanying thread he reflected on his tenure, calling AISI "my favourite job to date" and saying he was leaving with mixed feelings, while reaffirming the reasons he had joined: that safety progress is slow, that coordination is under-resourced, and that governments have a key role to play in coordination. ^[6]^[7] The move returned him to the San Francisco Bay Area, the centre of the AI alignment community in which he began his safety research and which includes organisations such as METR, founded by his former OpenAI colleague Beth Barnes. ^[7]

What is Sequent Research?

On June 10, 2026, Irving and his collaborators publicly launched Sequent Research, the nonprofit alignment organisation he had foreshadowed, in a post titled "Sequent: scale and automation for higher confidence in alignment." ^[13]^[14] Sequent is co-founded with Daniel Murfet, a mathematician who had been head of research at Timaeus and who left a tenured pure-mathematics position to work on alignment, and it brings together researchers from AISI's former alignment team and from Timaeus. ^[13]^[14] Named co-authors of the launch post, alongside Irving and Murfet, include Alex Holness-Tofts and Jacob Pfau (both from AISI's alignment work) and Jesse Hoogland, Stan van Wingerden, and Marco Cozzi (from Timaeus). ^[13]

Sequent's stated aim is to reach a higher degree of confidence that advanced AI systems are aligned. As the founders put it, "we are aiming at higher confidence via a portfolio of theory and empirics bets," each of which might fail but any one of which, if it succeeds, would provide more a priori confidence in aligned outcomes, while the organisation invests heavily in automation to accelerate that research. ^[13] The agenda connects Irving's long-running interest in scalable oversight and debate with singular learning theory, the mathematical approach to neural-network learning dynamics that Murfet and the Timaeus group helped pioneer. ^[13] Sequent maintains a large in-person presence in Berkeley in the Bay Area, with additional researchers working remotely from London, Melbourne, and elsewhere. ^[13]

References

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Christian Szegedy Outer alignment Process reward model (PRM)

Who is Geoffrey Irving?

Education

What did Geoffrey Irving do early in his career?

Machine learning and AI safety research

Google Brain

What is AI safety via debate?

Google DeepMind

What is the UK AI Security Institute?

Why did Geoffrey Irving leave AISI?

What is Sequent Research?

See also

References

Improve this article

Related Articles

Ilya Sutskever

Dario Amodei

Open Philanthropy

Eliezer Yudkowsky

Nick Bostrom

Nate Soares

What links here

Related Articles

Ilya Sutskever

Dario Amodei

Open Philanthropy

Eliezer Yudkowsky

Nick Bostrom

Nate Soares

What links here