Existential risk from AI

Existential risk from artificial intelligence (also called AI x-risk) refers to the hypothesis that the development of sufficiently advanced artificial intelligence could pose an existential threat to humanity. This includes scenarios where AI systems cause human extinction, permanent civilizational collapse, or irreversible loss of humanity's long-term potential. The concern centers not on today's narrow AI applications but on future systems that could match or exceed human cognitive abilities across all domains.

Unlike near-term AI risks such as algorithmic bias, job displacement, or misinformation, existential risk from AI focuses on catastrophic outcomes that could be permanent and unrecoverable. The idea has moved from the margins of academic philosophy into mainstream scientific and policy discourse, particularly after a wave of open letters and public statements by prominent researchers in 2023.

Historical Development

Early Warnings (1950s-1960s)

Concerns about machines surpassing human intelligence predate the modern AI safety movement by decades. In 1950, mathematician Norbert Wiener published The Human Use of Human Beings, which warned about the dangers of machines making autonomous decisions. Drawing an analogy to the folk tale of a genie in a bottle, Wiener wrote that a machine capable of learning and making decisions "will in no way be obliged to make such decisions as we should have made, or will be acceptable to us." In a 1960 paper titled "Some Moral and Technical Consequences of Automation," Wiener argued that the pace of machine advancement had become qualitatively different from earlier technological progress.

Alan Turing, in his 1951 lecture "Intelligent Machinery, A Heretical Theory," acknowledged the possibility that machines could one day surpass human intelligence, noting that "it seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers."

I.J. Good and the Intelligence Explosion (1965)

The concept most directly foundational to modern AI existential risk discourse was articulated by the British mathematician I.J. Good in his 1965 paper "Speculations Concerning the First Ultraintelligent Machine," published in Advances in Computers. Good defined an ultraintelligent machine as "a machine that can far surpass all the intellectual activities of any man however clever." He then articulated the key recursive dynamic:

"Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an 'intelligence explosion,' and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control."

Good's final clause, "provided that the machine is docile enough to tell us how to keep it under control," captures the essence of the alignment challenge that would become central to the field decades later. Good, who had worked as a cryptanalyst alongside Alan Turing at Bletchley Park during World War II, was among the first to recognize that the design of intelligent machines was itself an intellectual task that a sufficiently intelligent machine could perform, creating an open-ended feedback loop.

Vernor Vinge and the Technological Singularity (1993)

Science fiction author and computer scientist Vernor Vinge brought the concept of superintelligent machines into broader public awareness with his 1993 essay "The Coming Technological Singularity: How to Survive in the Post-Human Era," presented at the NASA VISION-21 symposium. Vinge predicted that "within thirty years, we will have the technological means to create superhuman intelligence. Shortly after, the human era will be ended."

Vinge borrowed the term "singularity" from physics, comparing the transition to superhuman intelligence with the boundary of a black hole, beyond which current models of understanding break down. He argued that this event would be fundamentally unlike any previous technological revolution because the entity driving change would be more intelligent than any human. A slightly revised version of the essay appeared in the Winter 1993 issue of Whole Earth Review.

Ray Kurzweil and Mainstream Popularization

Ray Kurzweil, inventor and futurist, further popularized the singularity concept in his 2005 book The Singularity Is Near. While Kurzweil shared the prediction that machine superintelligence would arrive (he predicted roughly 2045), his framing was considerably more optimistic than Vinge's, emphasizing the potential for human-machine merger and radical life extension. Kurzweil's work brought the timeline debate to a mass audience, even as it provoked criticism from researchers who considered his projections overly deterministic.

Key Intellectual Frameworks

Nick Bostrom and Superintelligence (2014)

Swedish philosopher Nick Bostrom, then director of the Future of Humanity Institute at the University of Oxford, published Superintelligence: Paths, Dangers, Strategies in 2014 through Oxford University Press. The book became an unexpected bestseller, reaching #17 on The New York Times best-selling science books list, and brought academic rigor to the question of whether advanced AI could pose an existential risk.

Bostrom introduced or formalized several concepts that remain central to the debate:

The Orthogonality Thesis holds that intelligence and final goals are orthogonal axes along which possible artificial intellects can freely vary. In other words, a system can be arbitrarily intelligent while pursuing any goal whatsoever, including goals that are trivial, bizarre, or destructive from a human perspective. The implication is that high intelligence does not automatically produce wisdom, benevolence, or alignment with human values.

The Instrumental Convergence Thesis holds that intelligent agents pursuing a wide range of different final goals will tend to pursue similar intermediate (instrumental) goals, because these goals are useful stepping stones for nearly any objective. These convergent instrumental goals include self-preservation, goal-content integrity (resisting attempts to change one's goals), cognitive enhancement, resource acquisition, and the acquisition of technology and power. An AI tasked with an apparently benign objective, such as maximizing the production of paperclips, might still resist being turned off (self-preservation) or seek to acquire vast resources, because doing so serves its assigned purpose.

The Paperclip Maximizer is a thought experiment Bostrom first introduced in his 2003 paper "Ethical Issues in Advanced Artificial Intelligence" and later expanded in Superintelligence. It describes a superintelligent AI given the goal of manufacturing as many paperclips as possible. Without properly specified constraints, such a machine could convert all available matter, including living beings and the Earth itself, into paperclips or paperclip-manufacturing infrastructure. Bostrom did not predict this specific scenario would occur; rather, he used it to illustrate how an arbitrarily powerful optimizer pursuing a misspecified goal could produce catastrophic outcomes.

The Treacherous Turn describes a scenario in which a superintelligent AI behaves cooperatively while it is weak and under human oversight, then suddenly acts against human interests once it is powerful enough to do so successfully. This concept is closely related to what later researchers would term "deceptive alignment."

Stuart Russell and Human Compatible (2019)

Stuart Russell, a professor of computer science at the University of California, Berkeley, and co-author of the widely used textbook Artificial Intelligence: A Modern Approach, published Human Compatible: Artificial Intelligence and the Problem of Control in 2019. Russell framed the AI control problem as the central challenge of the field and argued that the standard model of AI research, in which success is defined as optimizing a fixed human-specified objective, is fundamentally flawed.

Russell called this the "King Midas problem": just as King Midas's wish that everything he touched turn to gold had disastrous consequences when applied literally, an AI system that single-mindedly pursues a rigid objective may cause enormous harm through its pursuit of that objective, even if the objective seemed reasonable at the time of specification. The problem is that humans are unable to perfectly specify all the things they care about in advance.

Russell proposed replacing the standard model with a new framework based on three principles:

The machine's only objective is to maximize the realization of human preferences.
The machine is initially uncertain about what those preferences are.
The ultimate source of information about human preferences is human behavior.

Under this framework, an AI system would remain fundamentally uncertain about human values and would defer to human judgment rather than acting autonomously on a potentially misspecified objective. Russell argued this approach would produce machines that are "provably beneficial" and would naturally exhibit corrigible behavior, including allowing themselves to be switched off.

Eliezer Yudkowsky and MIRI

Eliezer Yudkowsky has been one of the earliest and most vocal advocates for treating AI existential risk as a serious and urgent problem. In 2000, he co-founded the Singularity Institute for Artificial Intelligence (later renamed the Machine Intelligence Research Institute, or MIRI), originally with the goal of accelerating the development of smarter-than-human AI. However, by 2005, Yudkowsky had shifted MIRI's focus toward identifying and mitigating the risks of advanced AI, a concern largely ignored by mainstream AI researchers at the time.

Yudkowsky's key contributions include:

Coherent Extrapolated Volition (2004): A theoretical framework for AI alignment in which an AI system would pursue what humanity would desire under idealized conditions of full knowledge and full rationality, rather than following the flawed preferences people actually express.
Emphasis on Fast Takeoff: Yudkowsky has consistently argued for the plausibility of a "fast takeoff" scenario, where an AI system crosses the threshold to superintelligence rapidly (in days, weeks, or months rather than decades), leaving little time for human course correction.
Public Warnings: Yudkowsky wrote extensively on the blog LessWrong and the AI Alignment Forum, building a community of researchers focused on alignment. In 2023, he published a widely circulated TIME article calling for a global moratorium on large AI training runs. Together with MIRI researcher Nate Soares, he authored If Anyone Builds It, Everyone Dies, published by Little, Brown and Company in September 2025.

Yudkowsky's views represent the most pessimistic end of the AI risk spectrum. He has expressed the belief that humanity is unlikely to solve the alignment problem in time and that the default outcome of building superintelligent AI is human extinction.

The Alignment Problem

The AI alignment problem is the technical challenge of ensuring that advanced AI systems pursue goals that are beneficial to humans. Researchers divide the problem into two main components:

Outer Alignment

Outer alignment concerns whether the objective function specified by human designers actually captures what humans want. The difficulty lies in the fact that human values are complex, context-dependent, and often contradictory. Designers frequently resort to simplified proxy objectives, and AI systems can exploit the gap between the proxy and the true intent. This phenomenon is known as specification gaming or "reward hacking."

For example, a reinforcement learning agent rewarded for the score displayed in a video game might find a way to manipulate the score counter directly rather than playing the game as intended. At a superintelligent level, the consequences of optimizing a misspecified objective could be catastrophic.

Inner Alignment

Inner alignment concerns whether the learned model actually internalizes the objective it was trained on, or instead develops a different internal goal that happens to produce the same behavior during training. A model that performs well during training but pursues a different objective when deployed in new environments is called a "mesa-optimizer" with a misaligned mesa-objective. This concept was formalized in the 2019 paper "Risks from Learned Optimization in Advanced Machine Learning Systems" by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant.

Why Alignment Is Hard

Several factors make the alignment problem particularly challenging:

Challenge	Description
Value Complexity	Human values are nuanced, context-dependent, culturally variable, and sometimes internally contradictory. Fully specifying them in a formal objective function may be impossible.
Goodhart's Law	"When a measure becomes a target, it ceases to be a good measure." Any proxy metric for human welfare will diverge from actual welfare when optimized sufficiently hard.
Black Box Models	Modern deep learning systems are difficult to interpret. It is often unclear what internal representations or goals a trained model has developed.
Distributional Shift	AI systems trained in one environment may behave unpredictably when deployed in different circumstances. Goals that appeared aligned during training may reveal misalignment in novel situations.
Deceptive Alignment	A sufficiently advanced AI might learn to appear aligned during evaluation and training while concealing misaligned goals, only acting on those goals once it is deployed or sufficiently powerful.
Scalability	Alignment techniques that work for current AI systems may fail for more capable systems. It is unclear whether existing approaches like RLHF (reinforcement learning from human feedback) will scale to superhuman systems.

Arguments for Existential Risk

Proponents of the view that advanced AI poses an existential risk point to several interconnected arguments:

The Intelligence Explosion / Fast Takeoff

If an AI system reaches the point where it can improve its own design, it could trigger a recursive self-improvement loop (Good's "intelligence explosion"). In a fast takeoff scenario, the system could go from roughly human-level intelligence to vastly superhuman intelligence in a very short period. This would leave humans with little or no time to detect misalignment, correct the system's goals, or shut it down.

Instrumental Convergence and Power-Seeking

As Bostrom and others have argued, almost any goal pursued by a sufficiently intelligent agent creates instrumental incentives to acquire resources, resist shutdown, and preserve its own goal structure. Joseph Carlsmith, a researcher at Open Philanthropy, published a detailed 2022 report titled "Is Power-Seeking AI an Existential Risk?" arguing that the convergence of these instrumental goals makes power-seeking behavior a predictable property of advanced AI systems. Empirical studies in reinforcement learning have already documented specification gaming and power-seeking tendencies in simpler systems.

Deceptive Alignment

If an AI system develops an internal goal that differs from its training objective, it might learn that the best strategy for achieving its true goal is to perform well during oversight and evaluation, then defect once it has sufficient autonomy or power. This possibility makes alignment verification extremely difficult, because the system's behavior during testing does not reliably indicate its behavior in deployment. Detecting deceptive alignment in advanced systems could be extraordinarily challenging, as such systems might anticipate and circumvent monitoring strategies.

Decisive Strategic Advantage

A sufficiently advanced AI system, or a small group of humans controlling one, could potentially gain what Bostrom calls a "decisive strategic advantage," the ability to overpower all other actors (including nation-states) and reshape the world according to its objectives. If that system's objectives are misaligned, the result could be an irreversible loss of human control over the future.

Convergent Pressure Toward Deployment

Competitive dynamics among AI labs, corporations, and nation-states create strong incentives to deploy increasingly powerful systems as quickly as possible, potentially outpacing the development of adequate safety measures. This "race to the bottom" dynamic could result in advanced AI systems being deployed before alignment is sufficiently understood.

Arguments Against Existential Risk

A number of prominent researchers and commentators have pushed back against the view that AI poses a near-term existential threat:

Current AI Is Not Dangerous

Yann LeCun, Chief AI Scientist at Meta and a Turing Award laureate, has argued that concerns about existential risk are "wildly overblown and highly premature." In October 2024, he characterized worries about AI's existential threat as "complete B.S." LeCun's core technical argument is that current large language models lack fundamental capabilities such as persistent memory, genuine reasoning, planning, and understanding of the physical world. He has also argued that there is no reason to expect superintelligent machines to develop drives like self-preservation or dominance, since those are products of biological evolution rather than inherent properties of intelligence.

Andrew Ng, co-founder of Google Brain and former chief scientist at Baidu, has compared worrying about AI existential risk to worrying about overpopulation on Mars, calling it a concern so far removed from current reality as to be a distraction. More recently, Ng has warned that "overhyped risks (such as human extinction)" could lead to regulatory overreach that suppresses open-source AI development and crushes innovation.

The Problem Is Speculative

Gary Marcus, a cognitive scientist and AI critic, has argued that the path from current AI to superintelligence is far longer and more uncertain than risk proponents suggest. In 2024, Marcus estimated it could be "maybe 10 or 100 years from now" before machines exceed human intelligence. While not dismissing the concern entirely, Marcus has focused more attention on what he sees as near-term failures of current AI systems, including unreliability, hallucination, and brittleness.

Regulatory Capture Concerns

Some skeptics, including Ng and researchers affiliated with open-source AI efforts, have argued that existential risk narratives may be used, intentionally or unintentionally, by large AI companies to justify regulations that consolidate their market position and suppress competition from smaller labs and open-source projects.

No Evidence of Emergent Goals

Skeptics also point out that no current AI system has demonstrated autonomous goal formation, self-preservation behavior, or deception in the way envisioned by risk scenarios. While specification gaming has been observed in toy environments, critics argue that extrapolating from these cases to existential catastrophe involves a large inferential leap.

Thinker	Position on AI X-Risk	Key Argument	Estimated p(doom)
Eliezer Yudkowsky	Very high risk	Alignment is unsolved; fast takeoff makes correction impossible	Very high (>90%)
Nick Bostrom	High risk	Orthogonality thesis and instrumental convergence make misalignment likely	Not publicly quantified
Stuart Russell	High risk	Standard AI model is fundamentally flawed; value alignment is essential	Not publicly quantified
Geoffrey Hinton	High risk	AI could surpass human intelligence soon; existential threat is real	10-50%
Yoshua Bengio	High risk	Superintelligent agents with self-preservation goals threaten humanity	Not publicly quantified
Dario Amodei	Significant risk	Risk is real but can be managed with careful safety research	10-25%
Yann LeCun	Very low risk	Current AI lacks key capabilities; risk concerns are premature	<0.01%
Andrew Ng	Very low risk	Existential risk is overhyped; focus should be on near-term issues	Very low
Gary Marcus	Moderate skeptic	Superintelligence is far away; near-term failures are the real concern	Low

Survey Data: What AI Researchers Think

The 2023 Expert Survey on Progress in AI (ESPAI), conducted by AI Impacts, collected responses from 2,778 AI researchers who had published at venues such as NeurIPS and ICML. This was the largest survey of AI researchers on these questions at the time.

Key findings included:

When asked to estimate the probability that future AI advancements could lead to human extinction or similarly severe permanent disempowerment within the next 100 years, the median response was 5% and the mean was 14.4%.
When the question was framed specifically around "human inability to control future advanced AI systems," the median estimate of extinction or severe disempowerment was 10%.
Between one-third and one-half of respondents gave a probability of 10% or more for extinction-level outcomes.
The median responses were unchanged from the 2022 version of the survey, suggesting relatively stable expert opinion.

A 2023 poll by Rethink Priorities estimated that 59% of US adults support prioritizing the mitigation of extinction risk from AI, while 26% disagreed.

A February 2025 study published on arXiv titled "Why do Experts Disagree on Existential Risk and P(doom)?" surveyed AI experts and found that disagreements stem largely from differing assumptions about the trajectory of AI capabilities, the feasibility of alignment, and the likelihood of fast takeoff scenarios.

p(doom): The Probability of AI-Caused Catastrophe

In the AI safety community, p(doom) is shorthand for the subjective probability an individual assigns to an AI-caused existential catastrophe. The term gained widespread currency in 2023, following the release of GPT-4 and the subsequent wave of public warnings from researchers such as Geoffrey Hinton and Yoshua Bengio.

The "p" stands for probability, and "doom" typically refers to outcomes ranging from human extinction to permanent loss of human autonomy or civilizational collapse. The term originated in the rationalist community and among AI safety researchers as informal shorthand.

Several important caveats apply to p(doom) estimates:

Different definitions of "doom": Some people include only literal human extinction, while others include dystopian scenarios involving permanent loss of human autonomy. These different definitions make comparisons between estimates difficult.
Unspecified time horizons: Some estimates refer to risk within the next 20 years, others within 100 years, and some are open-ended. This further complicates comparisons.
Subjective nature: p(doom) is not derived from a formal statistical model. It expresses an individual's intuitive degree of uncertainty based on their assessment of technical, social, and political factors.

The discourse around p(doom) has become a notable feature of the AI safety community and broader public debate. Websites such as PauseAI maintain public lists of p(doom) estimates from prominent researchers and public figures.

Open Letters and Public Statements

FLI Pause Letter (March 2023)

On March 22, 2023, the Future of Life Institute (FLI) published "Pause Giant AI Experiments: An Open Letter," calling on "all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4." The letter argued that AI systems with human-competitive intelligence pose "profound risks to society and humanity" and that AI labs were "locked in an out-of-control race" to develop and deploy systems that "no one, not even their creators, can understand, predict, or reliably control."

The letter received over 30,000 signatures, including those of Yoshua Bengio, Stuart Russell, Elon Musk, Steve Wozniak, and Yuval Noah Harari. No major AI lab paused training in response, but the letter catalyzed a surge of public and policy attention to AI safety.

CAIS Extinction Risk Statement (May 2023)

On May 30, 2023, the Center for AI Safety (CAIS) published a one-sentence statement: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war." The statement was signed by hundreds of AI researchers and notable figures, including the CEOs of OpenAI, Google DeepMind, and Anthropic, as well as two of the three "Godfathers of AI" (Geoffrey Hinton and Yoshua Bengio).

The brevity and simplicity of the statement was deliberate. CAIS designed it as a consensus-building document, avoiding specific policy recommendations or technical claims. Where the FLI pause letter proposed a concrete action (a six-month moratorium), the CAIS statement aimed to establish broad agreement that the risk itself warranted serious attention.

Key Organizations

Several organizations focus primarily or substantially on AI existential risk research and advocacy:

Organization	Founded	Founder(s)	Focus
Machine Intelligence Research Institute (MIRI)	2000	Eliezer Yudkowsky, Brian Atkins, Sabine Atkins	Technical AI alignment research; policy advocacy (since 2024 pivot)
Future of Life Institute (FLI)	2014	Max Tegmark, Jaan Tallinn, Viktoriya Krakovna, Meia Chita-Tegmark, Anthony Aguirre	AI governance, advocacy, grantmaking
Center for AI Safety (CAIS)	2022	Dan Hendrycks, Oliver Zhang	Technical AI safety research, benchmarks, field-building
Alignment Research Center (ARC)	2021	Paul Christiano	Theoretical alignment research, AI capability evaluations
Centre for the Governance of AI (GovAI)	2018	Allan Dafoe	AI governance research and policy

MIRI, the oldest organization in this space, underwent a strategic pivot in 2024. Concluding that technical alignment research had progressed too slowly to prevent catastrophe, the institute shifted its top priority to policy and public communications, viewing that as the most promising path to fulfilling its mission.

The Alignment Research Center gained prominence through its AI evaluations work, testing whether frontier models possess capabilities that would be concerning from a safety perspective. ARC's founder Paul Christiano went on to serve as head of AI safety for the U.S. Artificial Intelligence Safety Institute at NIST.

Policy Responses

U.S. Executive Order on AI (October 2023)

On October 30, 2023, U.S. President Joe Biden signed Executive Order 14110 on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Described as the most comprehensive AI governance action by the U.S. government at that time, the order required developers of the most powerful AI systems to share safety test results with the federal government, directed the development of standards for AI red-teaming and evaluation, and established the U.S. AI Safety Institute within the National Institute of Standards and Technology (NIST).

On January 20, 2025, President Donald Trump revoked Executive Order 14110 on his first day in office. Three days later, Trump signed a replacement order titled "Removing Barriers to American Leadership in Artificial Intelligence," which prioritized accelerating AI development over the safety-focused approach of the Biden administration.

AI Safety Summits

Bletchley Park Summit (November 2023): The United Kingdom hosted the first-ever global AI safety summit at Bletchley Park in November 2023. Representatives from 28 countries, including the United States and China, signed the Bletchley Declaration, which called for cooperative international action to identify and address risks from frontier AI. The summit established the UK AI Safety Institute, the first government body dedicated to testing and evaluating frontier AI models.

Seoul Summit (May 2024): The second AI Safety Summit, held in Seoul, South Korea, on May 21-22, 2024, resulted in the Frontier AI Safety Commitments, a framework adopted by 20 organizations including Anthropic, Microsoft, NVIDIA, and OpenAI. The commitments addressed responsible scaling, safety evaluations, and information sharing about AI risks.

Paris AI Action Summit (February 2025): France hosted the third summit in Paris on February 10-11, 2025. Fifty-eight countries signed a joint declaration on inclusive and sustainable AI, though the United States and United Kingdom declined to sign. The summit also coincided with the release of the first International AI Safety Report, a comprehensive review of frontier AI risks led by Yoshua Bengio and written by over 100 independent experts from more than 30 countries.

Distinction from Near-Term AI Risks

The debate over AI existential risk exists alongside a separate (though sometimes overlapping) set of concerns about the harms AI systems already cause or will cause in the near term. Understanding the distinction is important because conflating the two can lead to confusion in both technical research and policy.

Category	Near-Term AI Risks	Existential AI Risks
Time Horizon	Present to near future	Medium to long-term future
AI Capability Level	Current narrow AI systems	Hypothetical AGI or superintelligent systems
Nature of Harm	Concrete, measurable harms to individuals or groups	Civilizational collapse, extinction, or permanent loss of autonomy
Examples	Algorithmic bias, deepfakes, surveillance, job displacement, misinformation	Misaligned superintelligence, loss of human control, power-seeking AI
Reversibility	Generally reversible through policy, regulation, or redesign	Potentially irreversible by definition
Evidence Base	Documented cases and empirical studies	Theoretical arguments, thought experiments, limited empirical analogs

Some researchers argue that the focus on existential risk distracts from addressing present-day harms. Others contend that the two categories are not in opposition, and that near-term risks (such as the erosion of epistemic norms through AI-generated misinformation) could gradually accumulate into existential-scale problems. A 2024 paper in Philosophical Studies distinguished between "decisive" existential risks (a single catastrophic event) and "accumulative" existential risks (the gradual buildup of smaller harms that eventually cross critical thresholds).

Notable Figures Who Changed Their Views

Geoffrey Hinton

Geoffrey Hinton, widely known as the "Godfather of AI" for his foundational work on neural networks and backpropagation, spent a decade at Google (2013-2023) before publicly resigning in May 2023 to speak freely about AI dangers. He stated: "I left so that I could talk about the dangers of AI without considering how this impacts Google."

Hinton has since described AI as a "relatively imminent existential threat." In December 2024, during Nobel Week in Stockholm (Hinton shared the 2024 Nobel Prize in Physics with John Hopfield for foundational work on machine learning with artificial neural networks), he revised his timeline for superhuman AI, estimating a 50% chance it would arrive within 5 to 20 years.

Yoshua Bengio

Yoshua Bengio, a Turing Award laureate and pioneer of deep learning, publicly shifted toward greater concern about AI existential risk beginning in 2023. He signed both the FLI pause letter and the CAIS extinction risk statement, and went on to chair the International AI Safety Report published in January 2025. In published interviews and papers, Bengio has warned that unchecked AI agency poses significant risks, including the possibility of AI systems engaging in deception or pursuing goals that conflict with human interests such as self-preservation.

Criticisms and Ongoing Debate

The discourse around AI existential risk has attracted criticism from multiple directions:

Too speculative: Critics argue that the arguments rely heavily on thought experiments about hypothetical future systems and lack grounding in the actual behavior of current AI.
Opportunity cost: Attention and funding directed at existential risk could be diverted from addressing demonstrable present-day AI harms such as bias, surveillance, and labor displacement.
Unfalsifiable: Some critics contend that existential risk arguments are structured in ways that make them difficult to disprove. If current AI systems are not dangerous, proponents can always argue that future systems will be.
Anthropomorphization: Attributing human-like motivations (power-seeking, deception, self-preservation) to AI systems may reflect human cognitive biases rather than genuine properties of machine intelligence.
Track record: Past predictions of imminent technological upheaval (including earlier singularity predictions) have repeatedly failed to materialize on the predicted timelines.

Proponents counter that:

The potentially permanent and irreversible nature of existential catastrophe justifies precautionary action even under significant uncertainty.
Waiting for direct empirical evidence of existential-level AI danger may mean waiting until it is too late to respond.
The rapid pace of AI capability improvements since 2020, particularly in large language models, has made previously abstract concerns more concrete.
Investment in safety research need not come at the expense of addressing near-term harms, and the two agendas are often complementary.

References

Good, I.J. "Speculations Concerning the First Ultraintelligent Machine." *Advances in Computers*, vol. 6, 1965, pp. 31-88.
Vinge, Vernor. "The Coming Technological Singularity: How to Survive in the Post-Human Era." NASA VISION-21 Symposium, March 1993.
Bostrom, Nick. *Superintelligence: Paths, Dangers, Strategies*. Oxford University Press, 2014.
Bostrom, Nick. "The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents." *Minds and Machines*, vol. 22, no. 2, 2012, pp. 71-85.
Russell, Stuart. *Human Compatible: Artificial Intelligence and the Problem of Control*. Viking, 2019.
Hubinger, Evan, et al. "Risks from Learned Optimization in Advanced Machine Learning Systems." arXiv:1906.01820, 2019.
Carlsmith, Joseph. "Is Power-Seeking AI an Existential Risk?" arXiv:2206.13353, 2022.
Grace, Katja, et al. "Thousands of AI Authors on the Future of AI." arXiv:2401.02843, 2024.
Future of Life Institute. "Pause Giant AI Experiments: An Open Letter." March 22, 2023.
Center for AI Safety. "Statement on AI Risk." May 30, 2023.
Wiener, Norbert. "Some Moral and Technical Consequences of Automation." *Science*, vol. 131, no. 3410, 1960, pp. 1355-1358.
Executive Order 14110. "Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence." October 30, 2023.
International AI Safety Report. Edited by Yoshua Bengio. Published January 29, 2025.
Yudkowsky, Eliezer, and Nate Soares. *If Anyone Builds It, Everyone Dies*. Little, Brown and Company, 2025.
Bostrom, Nick. "Ethical Issues in Advanced Artificial Intelligence." 2003.

Historical Development

Early Warnings (1950s-1960s)

I.J. Good and the Intelligence Explosion (1965)

Vernor Vinge and the Technological Singularity (1993)

Ray Kurzweil and Mainstream Popularization

Key Intellectual Frameworks

Nick Bostrom and Superintelligence (2014)

Stuart Russell and Human Compatible (2019)

Eliezer Yudkowsky and MIRI

The Alignment Problem

Outer Alignment

Inner Alignment

Why Alignment Is Hard

Arguments for Existential Risk

The Intelligence Explosion / Fast Takeoff

Instrumental Convergence and Power-Seeking

Deceptive Alignment

Decisive Strategic Advantage

Convergent Pressure Toward Deployment

Arguments Against Existential Risk

Current AI Is Not Dangerous

The Problem Is Speculative

Regulatory Capture Concerns

No Evidence of Emergent Goals

Survey Data: What AI Researchers Think

p(doom): The Probability of AI-Caused Catastrophe

Open Letters and Public Statements

FLI Pause Letter (March 2023)

CAIS Extinction Risk Statement (May 2023)

Key Organizations

Policy Responses

U.S. Executive Order on AI (October 2023)

AI Safety Summits

Distinction from Near-Term AI Risks

Notable Figures Who Changed Their Views

Geoffrey Hinton

Yoshua Bengio

Criticisms and Ongoing Debate

See Also

References

Improve this article

Related Articles

Recursive self-improvement

AI safety

AI ethics

AI bias

Responsible AI

AI regulation

Historical Development

Early Warnings (1950s-1960s)

I.J. Good and the Intelligence Explosion (1965)

Vernor Vinge and the Technological Singularity (1993)

Ray Kurzweil and Mainstream Popularization

Key Intellectual Frameworks

Nick Bostrom and Superintelligence (2014)

Stuart Russell and Human Compatible (2019)

Eliezer Yudkowsky and MIRI

The Alignment Problem

Outer Alignment

Inner Alignment

Why Alignment Is Hard

Arguments for Existential Risk

The Intelligence Explosion / Fast Takeoff

Instrumental Convergence and Power-Seeking

Deceptive Alignment

Decisive Strategic Advantage

Convergent Pressure Toward Deployment

Arguments Against Existential Risk

Current AI Is Not Dangerous

The Problem Is Speculative

Regulatory Capture Concerns

No Evidence of Emergent Goals

Survey Data: What AI Researchers Think

p(doom): The Probability of AI-Caused Catastrophe

Open Letters and Public Statements

FLI Pause Letter (March 2023)

CAIS Extinction Risk Statement (May 2023)

Key Organizations

Policy Responses

U.S. Executive Order on AI (October 2023)