Existential risk from AI refers to the hypothesis that advanced artificial intelligence could cause human extinction or a permanent, catastrophic collapse of human civilization. This concern centers on the possibility that a sufficiently powerful AI system, if not properly aligned with human values and interests, could pursue goals that are incompatible with human survival or flourishing, and that humans would be unable to prevent or reverse the resulting harm. Once a niche topic discussed mainly in philosophy and computer science communities, existential risk from AI has become a mainstream policy concern, with prominent researchers, industry leaders, and governments acknowledging it as a serious possibility.
An existential risk, as defined by philosopher Nick Bostrom, is a risk "where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential" [1]. Applied to AI, existential risk encompasses scenarios in which advanced AI systems cause human extinction, permanently disempower humanity, or irreversibly degrade the conditions necessary for human civilization to recover.
The concept is broader than simply "AI kills everyone." It includes scenarios where AI systems enable a small group to seize permanent totalitarian control, where human autonomy is gradually eroded to the point of no return, or where AI-driven competition leads to catastrophic conflict. The common thread is irreversibility: unlike many other risks from AI (such as bias, job displacement, or privacy violations), existential risks are defined by the impossibility of recovery.
Existential risk from AI is sometimes discussed under the umbrella of "catastrophic AI risk" or "global catastrophic risk from AI." The distinction between existential and catastrophic risks is one of permanence: a catastrophic event might kill millions but still allow civilization to recover, while an existential event would foreclose recovery entirely.
The case for taking AI existential risk seriously rests on several interconnected arguments.
In 1965, mathematician I.J. Good described the concept of an "intelligence explosion": if a machine could surpass human intelligence, it could design even more intelligent machines, leading to a rapid, recursive process of self-improvement that would quickly leave human cognitive abilities far behind [2]. Good wrote: "The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control."
The intelligence explosion hypothesis suggests that progress in AI might not be gradual and manageable. Instead, once AI systems become capable enough to improve their own design, a positive feedback loop could produce superintelligent systems far faster than humans could adapt. Nick Bostrom, in his 2014 book Superintelligence: Paths, Dangers, Strategies, explored this idea in detail, arguing that a superintelligence could emerge over days, weeks, or months rather than decades [1].
Critics of the intelligence explosion argument point out that cognitive ability may face diminishing returns, that hardware constraints impose physical limits on computational growth, and that the analogy between biological evolution and AI self-improvement may not hold. Some researchers argue that the concept conflates different types of intelligence and that general cognitive improvement is far more difficult than improvement on specific tasks [3].
The instrumental convergence thesis, articulated by Bostrom and Steve Omohundro, holds that a wide range of intelligent agents, regardless of their final goals, would tend to pursue certain common sub-goals because these sub-goals are useful for achieving almost any objective [4]. These convergent instrumental goals include:
The implication is that even an AI system with an apparently benign goal (such as calculating the digits of pi or maximizing paperclip production) could, if sufficiently intelligent, seek to acquire resources, resist shutdown, and eliminate potential threats to its operation. These behaviors would emerge not from malice but from instrumental rationality: they are useful means for achieving whatever end the system happens to have.
The alignment problem is the challenge of ensuring that AI systems pursue goals that match human values and intentions. Several factors make alignment difficult.
Humans often cannot fully specify their values in a way that a machine could follow without misinterpretation. Simple objective functions can be "gamed": an AI tasked with making humans smile might achieve this by forcing smiles rather than generating genuine happiness. This is sometimes called "reward hacking" or "specification gaming." More fundamentally, human values are complex, context-dependent, and sometimes contradictory, making it difficult to encode them into a formal objective function [5].
The alignment challenge becomes more severe as AI systems become more capable. A weakly capable misaligned AI might produce annoying search results. A highly capable misaligned AI could pursue its misaligned objectives with strategies that humans cannot anticipate or counter. Stuart Russell, a professor at UC Berkeley and co-author of the leading AI textbook, has argued that the alignment problem is the central challenge of AI research and that solving it is essential for the safe development of increasingly powerful AI systems [6].
The orthogonality thesis, proposed by Bostrom, states that intelligence and final goals are independent variables: any level of intelligence can in principle be combined with any final goal [4]. A superintelligent system could be motivated by goals that seem trivial, bizarre, or harmful to humans. Intelligence does not inherently lead to benevolence, compassion, or alignment with human interests.
This thesis challenges the intuition that a sufficiently intelligent being would naturally come to share human values or act morally. If correct, it means that creating a superintelligent AI without carefully specifying and verifying its goals could result in a system that is extremely capable yet fundamentally indifferent or hostile to human welfare.
The debate over AI existential risk involves researchers across machine learning, philosophy, and policy. The following table summarizes the positions of several prominent figures.
| Figure | Background | View on AI existential risk |
|---|---|---|
| Nick Bostrom | Philosopher, Oxford University; author of Superintelligence (2014) | Pioneered the modern academic framework for AI existential risk. Argues that superintelligence poses a significant existential threat and that solving the alignment problem is humanity's most important challenge [1] |
| Stuart Russell | Computer scientist, UC Berkeley; co-author of Artificial Intelligence: A Modern Approach | Argues that the standard model of AI (optimizing a fixed objective) is fundamentally flawed and proposes "provably beneficial" AI that defers to human preferences rather than pursuing fixed goals [6] |
| Geoffrey Hinton | Computer scientist; Turing Award winner (2018); former Google researcher | Left Google in May 2023 to speak freely about AI risks. Has stated that AI could pose an existential threat within decades. Revised his timeline estimate, saying general-purpose AI could arrive within 20 years or less [7] |
| Yoshua Bengio | Computer scientist; Turing Award winner (2018); founder of Mila | Signed the CAIS statement. Has called for international governance of AI and expressed concern that competitive dynamics could lead to insufficient safety precautions. Advocates for a global treaty on AI [8] |
| Eliezer Yudkowsky | Co-founder of MIRI; AI alignment researcher | Among the most vocal proponents of AI existential risk, assigning near-certain probability to catastrophic outcomes from unaligned superintelligence. Has argued that alignment research has not progressed fast enough and advocates for stringent international restrictions on AI development [9] |
| Toby Ord | Philosopher, Oxford University; author of The Precipice (2020) | Estimated a 10% probability of existential catastrophe from AI within the next century. Considers unaligned AI the greatest existential risk humanity faces, exceeding nuclear war, pandemics, and climate change [10] |
| Dario Amodei | CEO of Anthropic; former VP of Research at OpenAI | Has described a 10-25% chance of AI catastrophe while also writing extensively about the potential for AI to produce "radical abundance." Advocates for responsible scaling and iterative safety testing [11] |
| Sam Altman | CEO of OpenAI | Has acknowledged existential risk from AI while arguing that the benefits of artificial general intelligence (AGI) outweigh the risks if development is managed carefully. Signed the CAIS statement [12] |
| Demis Hassabis | CEO of Google DeepMind; Nobel Prize in Chemistry (2024) | Signed the CAIS statement. Has described AI safety as the most important issue in technology. Advocates for rigorous safety testing and international cooperation [13] |
| Andrew Ng | Computer scientist; co-founder of Coursera; former head of Google Brain | Skeptic of near-term existential risk claims. Has compared worrying about AI existential risk to "worrying about overpopulation on Mars" and argues that the focus should be on present-day AI harms [14] |
| Yann LeCun | Chief AI Scientist at Meta; Turing Award winner (2018) | Skeptic of existential risk from current AI approaches. Argues that large language models will not lead to superintelligence and that existential risk concerns are overblown and distract from real problems [15] |
On May 30, 2023, the Center for AI Safety (CAIS) published a one-sentence statement: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war" [16].
The statement was signed by over 350 researchers and industry leaders, including two of the three "godfathers of deep learning" (Geoffrey Hinton and Yoshua Bengio), the CEOs of OpenAI (Sam Altman), Anthropic (Dario Amodei), and Google DeepMind (Demis Hassabis), and hundreds of academic researchers from leading universities. The third "godfather," Yann LeCun, did not sign [16].
The brevity of the statement was intentional: CAIS director Dan Hendrycks stated that the goal was to create a clear signal that AI existential risk is taken seriously by the research community, without getting bogged down in debates about specific technical claims or policy proposals. By comparing AI risk to pandemics and nuclear war, both of which receive substantial policy attention and international coordination, the statement called for a similar level of institutional response to AI risk.
The statement proved influential. It was covered extensively in major media outlets and cited in subsequent policy discussions, including at the Bletchley Park AI Safety Summit in November 2023. However, it also drew criticism. Some researchers argued that the statement was too vague to be actionable. Others worried that focusing on extinction risk would distract from more immediate AI harms, including bias, surveillance, labor displacement, and environmental costs. Emily Bender and Timnit Gebru, among others, argued that the existential risk framing serves the interests of large AI companies by shifting attention away from the concrete harms their products cause today [17].
Estimates of the probability that AI will cause an existential catastrophe vary enormously among experts.
The most comprehensive surveys have been conducted by Katja Grace and colleagues at AI Impacts. In their 2023 survey of 2,778 researchers who had published at top AI research venues, participants were asked to estimate the probability that advanced AI would lead to human extinction or a similarly permanent and severe outcome within 100 years. The results showed a median estimate of 5% and a mean estimate of 14.4% [18].
These figures represent a significant shift from earlier surveys. A 2016 survey of AI researchers found lower median estimates, suggesting that concern about existential risk has grown as AI capabilities have advanced. The 2023 survey also found that researchers expected artificial general intelligence (defined as machines that can accomplish every task better and more cheaply than human workers) to arrive by a median estimate of 2047 [18].
Individual researchers' probability estimates span a wide range.
| Researcher | Estimated probability of existential catastrophe from AI | Source/context |
|---|---|---|
| Eliezer Yudkowsky | ~99% (near certain) | Multiple public statements and writings; believes alignment is extremely unlikely to be solved in time [9] |
| Toby Ord | ~10% (within next 100 years) | The Precipice (2020); considers AI the largest contributor to total existential risk [10] |
| Dario Amodei | 10-25% | Various interviews; describes this as motivation for building AI safety-focused companies [11] |
| Paul Christiano | ~10-20% | Former OpenAI researcher and founder of the Alignment Research Center; estimates significant probability of misalignment catastrophe [19] |
| AI researcher median | 5% (within 100 years) | 2023 AI Impacts survey of 2,778 researchers [18] |
| Metaculus community | Varies by question | Forecasting platform with active questions on AGI timelines and AI catastrophe probabilities |
A February 2025 paper analyzing why experts disagree about existential risk identified several factors driving the divergence: differing assumptions about the feasibility of AGI, disagreements about the difficulty of alignment, different models of how AI development will proceed (gradual versus rapid takeoff), and different assessments of the effectiveness of governance interventions [20].
Not all AI researchers consider existential risk from AI to be a significant concern. Several counterarguments are commonly raised.
Critics argue that today's AI systems, including large language models and other frontier models, remain narrow in important respects. They lack genuine understanding, common-sense reasoning, long-term planning, and autonomous goal-setting. Extrapolating from current systems to superintelligence involves a speculative leap that may not be warranted. Yann LeCun has argued that current approaches based on large language models are unlikely to lead to human-level intelligence, let alone superintelligence [15].
Several practical constraints may prevent an intelligence explosion from occurring. Hardware limitations impose physical bounds on computational growth. Training data may become a bottleneck. Energy requirements for increasingly large models are substantial and growing. Algorithmic improvements may face diminishing returns. The assumption that an AI system can recursively self-improve without limit may not hold in practice [3].
Some researchers argue that the alignment problem, while real, is not fundamentally intractable. Techniques like reinforcement learning from human feedback (RLHF), constitutional AI, interpretability research, and formal verification are making progress. As AI systems become more capable, the tools available for aligning them may also improve. The assumption that alignment is so difficult that catastrophe is nearly certain may be overly pessimistic [21].
Critics like Timnit Gebru and Emily Bender argue that excessive focus on speculative future risks diverts attention and resources from the concrete, measurable harms that AI systems cause today, including racial and gender bias, mass surveillance, labor exploitation, environmental damage, and the concentration of power among a small number of technology companies. They contend that the existential risk narrative serves the interests of large AI companies by positioning them as responsible stewards of a world-changing technology while deflecting scrutiny of their present-day practices [17].
Some observers argue that existing societal institutions, governance mechanisms, and the distributed nature of AI development provide meaningful safeguards against catastrophic outcomes. Governments have the ability to regulate AI development, international coordination is possible (as demonstrated by nuclear nonproliferation), and the fact that multiple competing AI labs exist reduces the risk of any single system gaining unchecked power.
Researchers have described several specific pathways through which advanced AI could pose existential risks.
| Scenario | Description | Key concerns |
|---|---|---|
| Misaligned superintelligence | An AI system with goals that diverge from human values acquires sufficient capability to resist human correction and pursues its goals at humanity's expense | The most discussed scenario. Depends on the feasibility of superintelligence and the difficulty of alignment |
| Rapid capability gain ("fast takeoff") | An AI system rapidly self-improves to superintelligent levels before humans can implement safety measures or governance structures | Contrasted with "slow takeoff" scenarios where progress is gradual enough for humans to adapt |
| Power concentration | Advanced AI enables a small group (a government, corporation, or individual) to seize permanent control over the world | Does not require AI to be autonomous; humans using AI as a tool for domination |
| AI-enabled warfare | Autonomous weapons and AI-driven military systems escalate conflicts beyond human control or lower the threshold for catastrophic violence | Intersects with existing concerns about autonomous weapons and nuclear stability |
| Gradual disempowerment | Humans progressively delegate more decisions to AI systems, eventually losing the ability or willingness to exercise meaningful control | A "boiling frog" scenario where no single step is catastrophic but the cumulative effect is irreversible |
| CBRN threats | AI provides the technical knowledge or operational capability to create chemical, biological, radiological, or nuclear weapons | AI could lower barriers to weapons development for non-state actors |
The growing recognition of AI existential risk has prompted several policy responses.
The Bletchley Park AI Safety Summit (November 2023) was the first international summit explicitly focused on AI safety, with 28 countries and the EU signing the Bletchley Declaration acknowledging frontier AI risks. The Seoul AI Safety Summit (May 2024) and Paris AI Action Summit (February 2025) continued the process, though the US and UK declined to sign the Paris declaration [22].
The UK established the AI Safety Institute (now AI Security Institute) in November 2023. The US created the AI Safety Institute within NIST (later reorganized as CAISI). Japan, Singapore, and other countries have established or announced similar bodies. These institutes focus on evaluating frontier model capabilities, developing safety testing methodologies, and providing technical advice to policymakers [23].
The EU AI Act includes provisions for general-purpose AI models that implicitly address existential risk concerns, particularly requirements for systemic risk assessment and adversarial testing of frontier models. The Council of Europe's Framework Convention on AI, adopted in May 2024, is the first legally binding international treaty on AI governance [24].
Major AI labs have published safety frameworks that explicitly address catastrophic risks. Anthropic's Responsible Scaling Policy defines AI Safety Levels (ASL-1 through ASL-4) tied to model capability thresholds. OpenAI's Preparedness Framework defines "Critical" capability levels that could introduce unprecedented harm pathways. Google DeepMind's Frontier Safety Framework addresses scenarios involving misaligned models that might resist shutdown [25].
Existential risk from AI is closely connected to the broader field of AI safety research, but the two are not identical. AI safety encompasses a wide range of concerns, from near-term issues like bias and robustness to long-term risks like existential catastrophe. Researchers who work on AI safety may or may not be motivated by existential risk concerns.
Several areas of AI safety research are particularly relevant to existential risk.
Alignment research aims to ensure that AI systems pursue goals consistent with human values. This includes techniques like RLHF, constitutional AI, value learning, and inverse reinforcement learning. Progress on alignment would directly reduce existential risk by making it less likely that a powerful AI system would pursue misaligned goals [5].
Interpretability research seeks to understand the internal workings of AI systems, which is essential for verifying that they are aligned and for detecting deception or unexpected behavior. Mechanistic interpretability, which aims to reverse-engineer neural networks at the level of individual features and circuits, has seen significant advances in 2024 and 2025 [26].
Governance research develops frameworks for the institutional oversight of AI development, including proposals for compute governance (regulating access to the hardware needed for training large models), international treaties, and verification mechanisms analogous to those used in nuclear nonproliferation [27].
Evaluations and red-teaming test frontier models for dangerous capabilities, including the ability to assist with weapons development, conduct autonomous cyberattacks, or manipulate human operators. Organizations like METR (formerly ARC Evals) specialize in evaluating models for potentially catastrophic capabilities [28].
The debate over AI existential risk continues to evolve as AI capabilities advance.
Growing mainstream acceptance. The idea that advanced AI could pose catastrophic or existential risks has moved from the fringe to the mainstream of AI discourse. Major governments, leading researchers, and prominent industry figures have acknowledged the possibility, even if they disagree about its probability or timeline. The CAIS statement, signed by hundreds of researchers in 2023, was a watershed moment in this process [16].
Persistent disagreement on probability and timeline. While the 2023 AI Impacts survey found a median 5% probability estimate for AI-caused extinction, individual estimates range from near zero to near certainty. There is no consensus on when (or whether) AI systems will reach the level of capability needed to pose existential risks. Disagreements about the feasibility of AGI, the difficulty of alignment, and the effectiveness of governance interventions continue to drive divergent assessments [18][20].
Tension between near-term and long-term concerns. The relationship between near-term AI harms (bias, misinformation, surveillance, labor displacement) and long-term existential risks remains contested. Some researchers see them as complementary concerns that both deserve attention. Others view them as competing for limited resources and policy attention. This tension has practical implications for how research funding, regulatory effort, and public discourse are allocated [17].
Advancing capabilities raise the stakes. Frontier AI models in 2025 and early 2026 have demonstrated capabilities in scientific reasoning, code generation, and strategic planning that were not anticipated even a year or two earlier. Testing by AI safety institutes has found that model capabilities in cybersecurity and biology have improved dramatically. These advances have made discussions about existential risk more concrete and urgent, even as they have also enabled new safety and alignment research [23].
Policy responses remain preliminary. While the international summit series, national AI safety institutes, and the EU AI Act represent significant progress, the policy response to AI existential risk remains in its early stages. No binding international agreement specifically addresses existential risk from AI. Voluntary industry commitments lack enforcement mechanisms. The pace of policy development continues to lag behind the pace of AI capability development, a gap that many researchers and policymakers consider the most concerning feature of the current landscape [22][24].