Preparedness Framework (OpenAI)

AI Safety OpenAI

19 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

40 citations

Revision

v6 · 3,717 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

The Preparedness Framework is the risk-management policy maintained by openai for tracking, evaluating, forecasting, and mitigating catastrophic risks from frontier artificial-intelligence models. First published in beta form on 18 December 2023 and substantially revised on 15 April 2025, it codifies the conditions under which OpenAI commits to deploy, halt deployment of, or halt development of a frontier model on the basis of capability evaluations. It is OpenAI's counterpart to Anthropic's Responsible Scaling Policy and a leading example of an AI safety governance commitment among frontier labs.^[1]^[2]

Under the current Version 2 framework, OpenAI tracks three frontier-capability categories (Biological and Chemical, Cybersecurity, and AI Self-Improvement) and classifies a model at one of two thresholds, High or Critical. A model that reaches High capability "must have safeguards that sufficiently minimize the associated risk of severe harm before it is deployed," and a model that reaches Critical capability additionally requires safeguards during development. OpenAI defines "severe harm" in the framework as "the death or grave injury of thousands of people or hundreds of billions of dollars of economic damage," operationalized with examples such as more than 1,000 deaths or more than USD 100 billion in damages.^[2]^[11]

The framework is OpenAI's analog to Anthropic's Responsible Scaling Policy (RSP) and to google deepmind's Frontier Safety Framework (FSF). All three documents share the structure of a "risk-informed development policy": a defined set of dangerous capability categories, a set of capability or risk thresholds, evaluations ("scorecards" or "capability reports") that locate a model on those thresholds, and commitments about safeguards or non-deployment that are triggered when a threshold is crossed.^[3]^[4]

The framework is maintained by OpenAI's internal Preparedness team, which was announced in October 2023 and originally led by computer scientist Aleksander Mądry. Outputs of the framework, including capability evaluations and safety reasoning, are reviewed by an internal Safety Advisory Group (SAG) that issues recommendations to OpenAI leadership and the Safety and Security Committee of the OpenAI Board of Directors; final deployment decisions rest with the CEO or a person designated by them.^[2]^[5]

Key facts


Published	18 December 2023 (Beta, v1)^[1]
Latest version	Version 2, 15 April 2025^[2]
Maintainer	OpenAI Preparedness team / Safety Advisory Group^[2]
Tracked categories (v2)	Biological and Chemical, Cybersecurity, AI Self-Improvement^[2]
Capability thresholds (v2)	High, Critical^[2]
Subject	Tracking, evaluating, and mitigating catastrophic risks from frontier AI models^[1]

What is the Preparedness Framework?

OpenAI announced the formation of a dedicated Preparedness team on 26 October 2023 as part of a broader expansion of safety work that also included its (later disbanded) Superalignment team.^[5]^[6] The team's stated mission was to "tightly connect capability assessment, evaluations, and internal red teaming for frontier models, from the models we develop in the near future to those with AGI-level capabilities," and to develop and maintain a "Risk-Informed Development Policy" that would govern decisions about whether and how to deploy or further train frontier models.^[5]

The team was led by Aleksander Mądry, a tenured professor of computing at MIT and director of MIT's Center for Deployable Machine Learning, who joined OpenAI on leave from his academic post.^[5]^[7] In conjunction with the launch, OpenAI offered a "Preparedness Challenge": ten USD 25,000 API-credit prizes for the best public submissions identifying plausible and underexplored frontier-AI catastrophic risk scenarios.^[6]

The Preparedness team was conceived as one leg of a three-part safety apparatus: the Safety Systems team handled product-level abuse risk in deployed systems such as gpt 4o, the Superalignment team studied alignment of future "superintelligent" systems, and the Preparedness team was responsible for catastrophic-risk capability evaluation of the near-term frontier.^[5]^[8] The first written output of the Preparedness team was the beta Preparedness Framework, published roughly two months after the team's announcement.^[1]

What did Version 1 (Beta, December 2023) cover?

Version 1, formally titled "Preparedness Framework (Beta)," was published as a 27-page PDF on 18 December 2023.^[1]^[9] OpenAI characterized it as "a beta document" and "a living document" that would be revised in response to feedback and operating experience.^[1]

Tracked risk categories

The Beta framework committed OpenAI to evaluating frontier models in four "tracked risk categories":^[1]^[9]

Cybersecurity: uplift to offensive cyber-operations capability;
Chemical, Biological, Radiological, and Nuclear (CBRN) threats: uplift to creation or acquisition of weapons of mass destruction;
Persuasion: capability to generate persuasive content that could change beliefs at scale; and
Model autonomy: the capability of a model to act, self-exfiltrate, or accumulate resources autonomously without human direction.

OpenAI also acknowledged "unknown unknown" risk categories that might emerge over time and committed to revising the framework as new categories were identified.^[1]

Risk levels

For each tracked category, the framework defined four discrete risk levels: Low, Medium, High, and Critical.^[1]^[9] Each level was tied to an illustrative threshold describing the kinds of real-world uplift the model would have to provide to qualify.^[1]

Two operational commitments anchored the framework. Only models with a post-mitigation score of "medium" or below could be deployed, and only models with a post-mitigation score of "high" or below could be developed further.^[1]^[9] A "Critical" determination in any category would therefore halt further development of that model.^[1]

Scorecards and pre/post-mitigation evaluation

The framework's central methodological artifact was the Preparedness Scorecard: a table assigning a risk level to a model in each tracked category both pre-mitigation (the bare model's elicited capability after the team's best efforts to elicit it) and post-mitigation (the deployed product including refusals, classifiers, and other safeguards).^[1]^[10] OpenAI committed to performing scorecard evaluations throughout model training and development, including a final sweep before launch, and to re-evaluating models at "every 2x increase in effective compute," a more stringent cadence than the 4x cadence Anthropic adopted for its RSP at the time.^[1]^[3]

Governance

Version 1 created an internal Safety Advisory Group (SAG), a cross-functional team of OpenAI safety leaders responsible for reviewing scorecards and making recommendations to OpenAI leadership.^[1] Final deployment decisions rested with the CEO (sam altman), with the SAG's recommendations subject to oversight by the Safety and Security Committee of the OpenAI Board of Directors.^[1]^[11] The Preparedness team was required to send monthly status reports to the SAG.^[11]

Reception of v1

Initial reactions in December 2023 and January 2024 were mixed. Coverage in VentureBeat, TechCrunch, InfoQ, and elsewhere noted that OpenAI was, with the Beta document, the first frontier lab to publicly commit to a halt-development trigger ("Critical") in addition to a halt-deployment trigger ("High"), and to a 2x-compute re-evaluation cadence.^[10]^[12]^[13] Critics including Zvi Mowshowitz argued that the thresholds were vague, that evaluations were merely "illustrative," and that a model could plausibly cause catastrophic harm without ever reaching "Critical" in any single category.^[14] Researchers at SaferAI's "AI Lab Watch" project concluded that, while still underspecified, the Beta document on several axes (halt-development commitment, 2x cadence) was more concrete than its peer frameworks then in existence.^[4]

What changed in Version 2 (April 2025)?

OpenAI published Version 2 of the Preparedness Framework on 15 April 2025.^[2]^[15] The update was the first major revision since the 2023 Beta and dropped the "Beta" label. It made structural changes to the categories, the risk-level taxonomy, and the deployment-decision process.^[2]

Restructured tracked categories

Version 2 reduced the tracked-category list from four to three:^[2]^[16]

Biological and Chemical capabilities (a renaming and narrowing of "CBRN"; nuclear and radiological were moved into a Research Category);
Cybersecurity capabilities; and
AI Self-Improvement capabilities, broadly capturing the prior "Model autonomy" category but reframed around the prospect of a model accelerating AI R&D.^[2]^[16]

Persuasion was removed entirely from both the Tracked and Research category lists. OpenAI stated that persuasion risks would instead be addressed outside the Preparedness Framework via the Model Spec-style usage policies, election-integrity investments, and product-level restrictions on political-campaign tool use.^[2]^[16]^[17] The change drew immediate criticism on the grounds that highly persuasive AI could undermine its own safeguards by convincing users and overseers not to apply them.^[17]^[14]

Version 2 also introduced Research Categories, capability areas considered plausibly severe but not yet meeting the criteria for tracked status. These included Long-Range Autonomy, Autonomous Replication, Sandbagging / Deceptive Alignment, Undermining Safeguards, and Nuclear & Radiological threats.^[2]^[16]

Two-level capability taxonomy

The four-level Low/Medium/High/Critical taxonomy of v1 was replaced with a two-level High and Critical taxonomy:^[2]^[16]

High capability: capability that could "meaningfully amplify existing pathways to severe harm" (for example, providing meaningful uplift to a novice attempting a biological or chemical attack). Models judged High must have safeguards that "sufficiently minimize" associated risk before deployment.
Critical capability: capability that could "introduce unprecedented new pathways to severe harm" (for example, full autonomous execution of an attack, or generational AI R&D acceleration). Models judged Critical require safeguards sufficient to minimize risk during development, not only deployment.

OpenAI defined "severe harm" within the framework as harm imposed plausibly, measurably, severely, on a net-new basis, and either instantaneously or irremediably, operationalized in the document with examples such as more than 1,000 deaths or more than USD 100 billion in damages.^[2]^[16]

Capabilities Reports and Safeguards Reports

Version 2 replaced the single v1 "Preparedness Scorecard" with two distinct documents:^[2]

A Capabilities Report, which assesses whether the model has crossed a High or Critical threshold; and
A Safeguards Report, which sets out how mitigations are designed, verified, and operated for a model judged High or Critical.

Both reports are reviewed by the SAG, which then issues a recommendation to OpenAI Leadership, defined as "the CEO or a person designated by them." The SAG's role is advisory; leadership retains final go/no-go authority and "accepts any residual risks."^[2]^[11]

Competitive-adjustment clause

Version 2 introduced a controversial provision allowing OpenAI to adjust its safeguard requirements if another frontier developer releases a comparably capable model without comparable safeguards. The framework states that any such adjustment must be (i) preceded by rigorous confirmation that the risk landscape has changed, (ii) publicly acknowledged, (iii) judged not to "meaningfully increase the overall risk of severe harm," and (iv) maintained at a level "still more protective" than competitors'.^[2]^[15] Critics including TechCrunch and the AI-policy researcher Zvi Mowshowitz characterized the clause as institutionalizing a competitive "race to the bottom" on safety.^[15]^[16]

Leadership at the time of v2

Aleksander Mądry was reassigned in July 2024 to a research role focused on AI reasoning, after which the Preparedness team was led jointly by Joaquin Quiñonero Candela and Lilian Weng.^[7]^[18] Weng departed OpenAI in November 2024, and Quiñonero Candela transitioned to a different internal role in early 2025; researcher Tejal Patwardhan managed much of the team's day-to-day work during the v2 drafting period.^[18]^[19] At the time of v2's publication the Safety Advisory Group had been operating under the leadership of policy researcher Sandhini Agarwal for approximately two months, according to an OpenAI spokesperson cited by Fortune.^[17]^[19]

How has the framework been applied to OpenAI models?

GPT-4o (August 2024)

The Preparedness Framework was first applied at the public-system-card level to gpt 4o, whose system card was published on 8 August 2024.^[20] The Preparedness Scorecard for GPT-4o reported three of the four v1 categories at Low and one (Persuasion) at borderline Medium, driven specifically by textual persuasion of political opinions; the voice modality was assessed as not more persuasive than a human. The overall risk classification for GPT-4o was Medium, below the High threshold that would have barred deployment.^[20]^[21]

o1 (September 2024)

o1 was the first OpenAI model to be classified as Medium risk in CBRN in addition to Persuasion, as reported in the o1 System Card dated 12 September 2024 (and an updated December 2024 version).^[22]^[23] The o1 evaluations also involved external red-teaming by apollo research and metr, with Apollo reporting that o1 displayed in-context scheming and strategic-deception behavior at higher rates than prior models. The CBRN classification was driven in particular by uplift on long-form biothreat questions among graduate-level participants.^[22]^[23]

o3-mini, deep research, and Operator (January-February 2025)

The o3-mini system card (31 January 2025), the Deep Research system card (February 2025) and the Operator system card were the last sets of model-launch documents produced under the v1 framework.^[24]^[25] All three were evaluated against the four v1 categories, with persuasion typically reported as the closest-to-threshold category.^[24]

o3 and o4-mini (April 2025)

o3 and o4 mini, whose joint system card was published on 16 April 2025, were the first models evaluated end-to-end under the v2 framework. The SAG concluded that neither model reached the High threshold in any of the three v2 tracked categories (Biological & Chemical, Cybersecurity, AI Self-Improvement), allowing deployment without the additional safeguards reserved for High-capability models.^[26]^[27]

ChatGPT Agent (July 2025)

The system card for ChatGPT Agent, published on 17 July 2025, marked the first launch of an OpenAI product treated as High capability in the Biological & Chemical domain under v2. OpenAI stated that while it lacked definitive evidence that the model could meaningfully help a novice create severe biological harm (the v2 definition of the High threshold) it had chosen to "take a precautionary approach" and to activate the full set of associated safeguards, including dual-use refusal training, always-on classifiers and reasoning monitors, and enforcement pipelines.^[28]

GPT-5 family (August 2025 onward)

The GPT-5 System Card, published on 13 August 2025, accompanied the launch of the GPT-5 model family. OpenAI classified the reasoning-trained variant gpt-5-thinking as High capability in the Biological & Chemical domain under the v2 framework, activating the Bio/Chem safeguards stack. The system card also introduced new safety-evaluation categories (including deception-monitoring and sandbagging-detection) and reported the results of multi-stakeholder red-teaming including government partners.^[29] Subsequent GPT-5.x releases (among them GPT-5.2 in December 2025 and gpt 5 codex in September 2025) were evaluated under the same v2 process, with addenda to the system card capturing variant-specific results.^[30]

What are the main criticisms of the Preparedness Framework?

The Preparedness Framework has been broadly received as one of the more concrete pre-deployment safety policies among frontier labs, while drawing sustained criticism on several axes.

Specificity of thresholds. Researchers including Zvi Mowshowitz and the authors of the SaferAI "AI Lab Watch" project have argued that the v1 and v2 thresholds remain "illustrative" rather than operationalized, that the move from a four-level taxonomy to a two-level High/Critical taxonomy in v2 made low-level capability shifts harder to track, and that the v2 definition of "severe harm" (>1,000 deaths or >USD 100B in damages) sets the bar so high that "Medium-capability" systems could nonetheless enable significant harm.^[14]^[4]^[17]^[31]

Removal of Persuasion. The v2 decision to drop Persuasion from both Tracked and Research categories was criticized by Fortune and others, who noted that mass-manipulation risk has been repeatedly cited by U.S. and EU policymakers as a near-term harm and that persuasive AI is precisely the category most likely to corrode the human-oversight assumptions on which the rest of the framework depends.^[17]^[14]

Self-evaluation and SAG transparency. Both v1 and v2 rely on internal evaluations reviewed by an internal SAG whose membership has not been publicly disclosed. Reporting by The Information in mid-2024 and by The Register in late 2025 has emphasized that final go/no-go authority rests with the CEO and that the SAG's role is purely advisory, raising questions about the binding force of the framework's commitments.^[32]^[33]

Competitive-adjustment clause. The v2 provision permitting OpenAI to lower its safeguard requirements if a competitor releases a comparably capable system without comparable safeguards was widely criticized in April 2025 as a "race-to-the-bottom" clause by TechCrunch, Fortune, and the Midas Project's "Watchtower" project.^[15]^[17]^[34]

Lawmaker references. During the 2024 debate over California's SB 1047 ("Safe and Secure Innovation for Frontier Artificial Intelligence Models Act"), proponents of the bill cited the Preparedness Framework as evidence that frontier labs themselves recognize that catastrophic-risk evaluations were warranted; OpenAI publicly opposed the bill (in a letter signed by chief strategy officer Jason Kwon) on the grounds that frontier-AI regulation should be federal rather than state-level, and the bill was ultimately vetoed by Governor Gavin Newsom in September 2024.^[35]^[36] Mądry, as Head of Preparedness, also submitted written testimony to the U.S. Senate's Schumer "AI Insight Forum" in 2024 in which he described the Preparedness Framework as the operational basis for OpenAI's catastrophic-risk work.^[37]

Academic critique. A 2025 working paper by Robin et al., titled "The 2025 OpenAI Preparedness Framework does not guarantee any AI risk mitigation practices: a proof-of-concept for affordance analyses of AI safety policies," argued that the framework's language gives OpenAI a series of discretionary "affordances" (points at which the framework permits but does not require risk-mitigation actions) and concluded that it should not be relied upon as a binding safety commitment in the absence of external enforcement.^[31]

Leadership churn. Coverage in TechCrunch, Engadget, and The Register in December 2025 noted that the Head of Preparedness role had been functionally vacant for much of 2025 and that OpenAI was actively recruiting a new senior preparedness lead with a reported compensation range exceeding USD 500,000, framing the recruiting drive as a sign of both the framework's continued centrality and the difficulty of staffing it.^[33]^[38]

How does the Preparedness Framework compare to peer frameworks?

The Preparedness Framework is one of three principal pre-deployment safety policies maintained by frontier-AI developers, alongside anthropic's responsible scaling policy (RSP, first published September 2023) and google deepmind's Frontier Safety Framework (FSF, first published May 2024 and updated in 2025).^[3]^[4]

The three documents share a common structural template: dangerous-capability categories, capability or risk thresholds, capability evaluations, and threshold-triggered safeguard commitments. They differ in several respects:^[3]^[4]

Halt-development trigger. OpenAI's framework is the only one of the three that explicitly commits, in writing, to halt further development of a model that reaches the top capability level ("Critical" in v1; v2 reframes this around safeguards during development rather than a development halt). Anthropic's RSP frames the analogous trigger around ASL ratings and the readiness of safeguards. DeepMind's FSF historically did not pre-commit to halting development at a threshold, only to pausing deployment or development if mitigations were not in place.
Evaluation cadence. The v1 Preparedness Framework committed OpenAI to re-evaluation at every 2x increase in effective compute, a more frequent cadence than the 4x cadence then used by Anthropic.
Category taxonomy. All three frameworks cover cyber and bio/chem capabilities. OpenAI's framework, after v2, no longer includes Persuasion as a tracked category, whereas DeepMind's 2025 FSF update introduced a "manipulation" capability area; Anthropic's RSP focuses primarily on CBRN, autonomy, and cyber capability uplift.
Governance. All three frameworks centralize final authority in a senior internal body or the CEO, with an internal expert-review group (SAG at OpenAI; the Responsible Scaling Officer at Anthropic; the Frontier Safety Council at DeepMind) producing advisory inputs.

Comparative analyses by SaferAI, the Machine Intelligence Research Institute (MIRI), and the Federation of American Scientists have argued that on several dimensions (most notably the explicit halt-development trigger and the 2x evaluation cadence) the OpenAI framework is more concrete than its peers, while on others (most notably the v2 reframing away from "halt training" toward "deploy with safeguards" and the competitive-adjustment clause) it is now less concrete than the post-2024 Anthropic RSP.^[4]^[39]^[40]

References

OpenAI. "Preparedness Framework (Beta)." 18 December 2023. https://cdn.openai.com/openai-preparedness-framework-beta.pdf ↩
OpenAI. "Preparedness Framework Version 2." 15 April 2025. https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf ↩
Anthropic. "Responsible Scaling Policy." 19 September 2023 and subsequent revisions. https://www.anthropic.com/news/anthropics-responsible-scaling-policy ↩
SaferAI. "Is OpenAI's Preparedness Framework better than its competitors' Responsible Scaling Policies? A Comparative Analysis." https://www.safer-ai.org/is-openais-preparedness-framework-better-than-its-competitors-responsible-scaling-policies-a-comparative-analysis ↩
OpenAI. "Frontier risk and preparedness." 26 October 2023. https://openai.com/index/frontier-risk-and-preparedness/ ↩
TechCrunch. "OpenAI forms team to study 'catastrophic' AI risks, including nuclear threats." 26 October 2023. https://techcrunch.com/2023/10/26/openai-forms-team-to-study-catastrophic-risks-including-nuclear-threats/ ↩
CNBC. "OpenAI reassigns top AI safety executive Aleksandr Madry to role focused on AI reasoning." 23 July 2024. https://www.cnbc.com/2024/07/23/openai-removes-ai-safety-executive-aleksander-madry-from-role.html ↩
OpenAI. "Our approach to frontier risk." https://openai.com/global-affairs/our-approach-to-frontier-risk/ ↩
InfoQ. "OpenAI Adopts Preparedness Framework for AI Safety." January 2024. https://www.infoq.com/news/2024/01/openai-safety-framework/ ↩
VentureBeat. "OpenAI announces 'Preparedness Framework' to track and mitigate AI risks." 18 December 2023. https://venturebeat.com/ai/openai-announces-preparedness-framework-to-track-and-mitigate-ai-risks/ ↩
OpenAI. "Updating our Preparedness Framework." Blog post. 15 April 2025. https://openai.com/index/updating-our-preparedness-framework/ ↩
TechCrunch. Coverage of December 2023 Preparedness Framework launch. https://techcrunch.com/ ↩
Technology Magazine. "OpenAI release preparedness framework to improve AI safety." December 2023. https://technologymagazine.com/ai-and-machine-learning/openai-release-preparedness-framework-to-improve-ai-safety ↩
Zvi Mowshowitz. "On OpenAI's Preparedness Framework." Don't Worry About the Vase, December 2023. https://thezvi.substack.com/p/on-openais-preparedness-framework ↩
TechCrunch. "OpenAI may 'adjust' its safeguards if rivals release 'high-risk' AI." 15 April 2025. https://techcrunch.com/2025/04/15/openai-says-it-may-adjust-its-safety-requirements-if-a-rival-lab-releases-high-risk-ai/ ↩
Zvi Mowshowitz. "OpenAI Preparedness Framework 2.0." Don't Worry About the Vase, 2 May 2025. https://thezvi.wordpress.com/2025/05/02/openai-preparedness-framework-2-0/ ↩
Fortune. "OpenAI updated its safety framework, but no longer sees mass manipulation and disinformation as a critical risk." 16 April 2025. https://fortune.com/2025/04/16/openai-safety-framework-manipulation-deception-critical-risk/ ↩
Effective Altruism Forum. "Top OpenAI Catastrophic Risk Official Steps Down Abruptly." https://forum.effectivealtruism.org/posts/mbzXQcEZofzjJweSk/top-openai-catastrophic-risk-official-steps-down-abruptly ↩
The Midas Project Watchtower. "OpenAI: 04/15/25." https://www.themidasproject.com/watchtower/openai-041525 ↩
OpenAI. "GPT-4o System Card." 8 August 2024. https://cdn.openai.com/gpt-4o-system-card.pdf ↩
Maginative. "OpenAI Publishes GPT-4o Model Card Detailing Extensive Safety and Risk Mitigation Measures." August 2024. https://www.maginative.com/article/openai-publishes-gpt-4o-model-card-detailing-extensive-safety-and-risk-mitigation-measures/ ↩
OpenAI. "OpenAI o1 System Card." 12 September 2024. https://cdn.openai.com/o1-system-card.pdf ↩
OpenAI. "OpenAI o1 System Card (updated)." 5 December 2024. https://cdn.openai.com/o1-system-card-20241205.pdf ↩
OpenAI. "OpenAI o3-mini System Card." 31 January 2025. https://openai.com/index/o3-mini-system-card/ ↩
OpenAI. "Deep research System Card." February 2025. https://openai.com/index/deep-research-system-card/ ↩
OpenAI. "OpenAI o3 and o4-mini System Card." 16 April 2025. https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f3722c1/o3-and-o4-mini-system-card.pdf ↩
OpenAI Deployment Safety Hub. "OpenAI o3 and o4-mini." https://deploymentsafety.openai.com/o3 ↩
OpenAI. "ChatGPT Agent System Card." 17 July 2025. https://cdn.openai.com/pdf/839e66fc-602c-48bf-81d3-b21eacc3459d/chatgpt_agent_system_card.pdf ↩
OpenAI. "GPT-5 System Card." 13 August 2025. https://cdn.openai.com/gpt-5-system-card.pdf ↩
OpenAI. "Update to GPT-5 System Card: GPT-5.2." 11 December 2025. https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf ↩
"The 2025 OpenAI Preparedness Framework does not guarantee any AI risk mitigation practices: a proof-of-concept for affordance analyses of AI safety policies." arXiv:2509.24394, 2025. https://arxiv.org/abs/2509.24394 ↩
The Information. "OpenAI Removes AI Safety Leader Mądry, a Onetime Ally of CEO Altman." July 2024. https://www.theinformation.com/articles/openai-removes-ai-safety-leader-m-dry-a-onetime-ally-of-ceo-altman ↩
The Register. "OpenAI seeks new safety chief as Altman flags growing risks." 29 December 2025. https://www.theregister.com/2025/12/29/openai_safety_chief/ ↩
The Midas Project. "Watchtower: OpenAI." https://www.themidasproject.com/watchtower/ ↩
SD11 (California State Senator Scott Wiener). "Senator Wiener Responds to OpenAI Opposition to SB 1047." 2024. https://sd11.senate.ca.gov/news/senator-wiener-responds-openai-opposition-sb-1047 ↩
Carnegie Endowment for International Peace. "All Eyes on Sacramento: SB 1047 and the AI Safety Debate." September 2024. https://carnegieendowment.org/posts/2024/09/california-sb1047-ai-safety-regulation ↩
Aleksander Mądry. "Statement of Aleksander Mądry, Head of Preparedness, OpenAI." U.S. Senate AI Insight Forum, 2024. https://www.schumer.senate.gov/imo/media/doc/Aleksander%20Madry%20-%20Statement.pdf ↩
TechCrunch. "OpenAI is looking for a new Head of Preparedness." 28 December 2025. https://techcrunch.com/2025/12/28/openai-is-looking-for-a-new-head-of-preparedness/ ↩
Machine Intelligence Research Institute. "Existing Safety Frameworks Imply Unreasonable Confidence." 9 April 2025. https://intelligence.org/2025/04/09/existing-safety-frameworks-imply-unreasonable-confidence/ ↩
Federation of American Scientists. "Can Preparedness Frameworks Pull Their Weight?" https://fas.org/publication/scaling-ai-safety/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

5 revisions by 1 contributor · full history

Suggest edit

What links here

AI Seoul Summit Bletchley Declaration Cybench Frontier Safety Framework (Google DeepMind)GPT-5 GPT-5.1-Codex-Max MLE-bench Model Evaluation OpenAI o3 OpenAI o3-mini PaperBench Rule-Based Rewards (RBR)Sandbagging (artificial intelligence)Superalignment

Key facts

What is the Preparedness Framework?

What did Version 1 (Beta, December 2023) cover?

Tracked risk categories

Risk levels

Scorecards and pre/post-mitigation evaluation

Governance

Reception of v1

What changed in Version 2 (April 2025)?

Restructured tracked categories

Two-level capability taxonomy

Capabilities Reports and Safeguards Reports

Competitive-adjustment clause

Leadership at the time of v2

How has the framework been applied to OpenAI models?

GPT-4o (August 2024)

o1 (September 2024)

o3-mini, deep research, and Operator (January-February 2025)

o3 and o4-mini (April 2025)

ChatGPT Agent (July 2025)

GPT-5 family (August 2025 onward)

What are the main criticisms of the Preparedness Framework?

How does the Preparedness Framework compare to peer frameworks?

References

Improve this article

Related Articles

Cybersecurity ChatGPT Plugins

Ilya Sutskever

Rule-Based Rewards (RBR)

Model Spec

OpenAI Moderation API

ChatGPT

What links here

Related Articles

Cybersecurity ChatGPT Plugins

Ilya Sutskever

Rule-Based Rewards (RBR)

Model Spec

OpenAI Moderation API

ChatGPT

What links here