Preparedness Framework (OpenAI)
Last reviewed
May 31, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 3,544 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 3,544 words
Add missing citations, update stale details, or suggest a clearer explanation.
The Preparedness Framework is the risk-assessment and pre-deployment evaluation policy maintained by openai for tracking, evaluating, forecasting, and mitigating catastrophic risks arising from frontier artificial-intelligence models. First published in beta form on 18 December 2023 and substantially revised on 15 April 2025, it is the document that codifies the conditions under which OpenAI commits to deploy, halt deployment of, or halt development of a frontier model on the basis of capability evaluations.[1][2]
The framework is OpenAI's analog to anthropic's Responsible Scaling Policy (RSP) and to google deepmind's Frontier Safety Framework (FSF). All three documents share the structure of a "risk-informed development policy": a defined set of dangerous capability categories, a set of capability or risk thresholds, evaluations ("scorecards" or "capability reports") that locate a model on those thresholds, and commitments about safeguards or non-deployment that are triggered when a threshold is crossed.[3][4]
The framework is maintained by OpenAI's internal Preparedness team, which was announced in October 2023 and originally led by computer scientist Aleksander Mądry. Outputs of the framework, including capability evaluations and safety reasoning, are reviewed by an internal Safety Advisory Group (SAG) that issues recommendations to OpenAI leadership and the Safety and Security Committee of the OpenAI Board of Directors; final deployment decisions rest with the CEO or a person designated by them.[2][5]
| Published | 18 December 2023 (Beta, v1)[1] |
| Latest version | Version 2, 15 April 2025[2] |
| Maintainer | OpenAI Preparedness team / Safety Advisory Group[2] |
| Subject | Tracking, evaluating, and mitigating catastrophic risks from frontier AI models[1] |
OpenAI announced the formation of a dedicated Preparedness team on 26 October 2023 as part of a broader expansion of safety work that also included its (later disbanded) Superalignment team.[5][6] The team's stated mission was to "tightly connect capability assessment, evaluations, and internal red teaming for frontier models, from the models we develop in the near future to those with AGI-level capabilities," and to develop and maintain a "Risk-Informed Development Policy" that would govern decisions about whether and how to deploy or further train frontier models.[5]
The team was led by Aleksander Mądry, a tenured professor of computing at MIT and director of MIT's Center for Deployable Machine Learning, who joined OpenAI on leave from his academic post.[5][7] In conjunction with the launch, OpenAI offered a "Preparedness Challenge": ten USD 25,000 API-credit prizes for the best public submissions identifying plausible and underexplored frontier-AI catastrophic risk scenarios.[6]
The Preparedness team was conceived as one leg of a three-part safety apparatus: the Safety Systems team handled product-level abuse risk in deployed systems such as gpt 4o, the Superalignment team studied alignment of future "superintelligent" systems, and the Preparedness team was responsible for catastrophic-risk capability evaluation of the near-term frontier.[5][8] The first written output of the Preparedness team was the beta Preparedness Framework, published roughly two months after the team's announcement.[1]
Version 1, formally titled "Preparedness Framework (Beta)," was published as a 27-page PDF on 18 December 2023.[1][9] OpenAI characterized it as "a beta document" and "a living document" that would be revised in response to feedback and operating experience.[1]
The Beta framework committed OpenAI to evaluating frontier models in four "tracked risk categories":[1][9]
OpenAI also acknowledged "unknown unknown" risk categories that might emerge over time and committed to revising the framework as new categories were identified.[1]
For each tracked category, the framework defined four discrete risk levels: Low, Medium, High, and Critical.[1][9] Each level was tied to an illustrative threshold describing the kinds of real-world uplift the model would have to provide to qualify.[1]
Two operational commitments anchored the framework. Only models with a post-mitigation score of "medium" or below could be deployed, and only models with a post-mitigation score of "high" or below could be developed further.[1][9] A "Critical" determination in any category would therefore halt further development of that model.[1]
The framework's central methodological artifact was the Preparedness Scorecard: a table assigning a risk level to a model in each tracked category both pre-mitigation (the bare model's elicited capability after the team's best efforts to elicit it) and post-mitigation (the deployed product including refusals, classifiers, and other safeguards).[1][10] OpenAI committed to performing scorecard evaluations throughout model training and development, including a final sweep before launch, and to re-evaluating models at "every 2x increase in effective compute," a more stringent cadence than the 4x cadence Anthropic adopted for its RSP at the time.[1][3]
Version 1 created an internal Safety Advisory Group (SAG), a cross-functional team of OpenAI safety leaders responsible for reviewing scorecards and making recommendations to OpenAI leadership.[1] Final deployment decisions rested with the CEO (sam altman), with the SAG's recommendations subject to oversight by the Safety and Security Committee of the OpenAI Board of Directors.[1][11] The Preparedness team was required to send monthly status reports to the SAG.[11]
Initial reactions in December 2023 and January 2024 were mixed. Coverage in VentureBeat, TechCrunch, InfoQ, and elsewhere noted that OpenAI was, with the Beta document, the first frontier lab to publicly commit to a halt-development trigger ("Critical") in addition to a halt-deployment trigger ("High"), and to a 2x-compute re-evaluation cadence.[10][12][13] Critics including Zvi Mowshowitz argued that the thresholds were vague, that evaluations were merely "illustrative," and that a model could plausibly cause catastrophic harm without ever reaching "Critical" in any single category.[14] Researchers at SaferAI's "AI Lab Watch" project concluded that, while still underspecified, the Beta document on several axes (halt-development commitment, 2x cadence) was more concrete than its peer frameworks then in existence.[4]
OpenAI published Version 2 of the Preparedness Framework on 15 April 2025.[2][15] The update was the first major revision since the 2023 Beta and dropped the "Beta" label. It made structural changes to the categories, the risk-level taxonomy, and the deployment-decision process.[2]
Version 2 reduced the tracked-category list from four to three:[2][16]
Persuasion was removed entirely from both the Tracked and Research category lists. OpenAI stated that persuasion risks would instead be addressed outside the Preparedness Framework via the Model Spec-style usage policies, election-integrity investments, and product-level restrictions on political-campaign tool use.[2][16][17] The change drew immediate criticism on the grounds that highly persuasive AI could undermine its own safeguards by convincing users and overseers not to apply them.[17][14]
Version 2 also introduced Research Categories, capability areas considered plausibly severe but not yet meeting the criteria for tracked status. These included Long-Range Autonomy, Autonomous Replication, Sandbagging / Deceptive Alignment, Undermining Safeguards, and Nuclear & Radiological threats.[2][16]
The four-level Low/Medium/High/Critical taxonomy of v1 was replaced with a two-level High and Critical taxonomy:[2][16]
OpenAI defined "severe harm" within the framework as harm imposed plausibly, measurably, severely, on a net-new basis, and either instantaneously or irremediably, operationalized in the document with examples such as more than 1,000 deaths or more than USD 100 billion in damages.[2][16]
Version 2 replaced the single v1 "Preparedness Scorecard" with two distinct documents:[2]
Both reports are reviewed by the SAG, which then issues a recommendation to OpenAI Leadership, defined as "the CEO or a person designated by them." The SAG's role is advisory; leadership retains final go/no-go authority and "accepts any residual risks."[2][11]
Version 2 introduced a controversial provision allowing OpenAI to adjust its safeguard requirements if another frontier developer releases a comparably capable model without comparable safeguards. The framework states that any such adjustment must be (i) preceded by rigorous confirmation that the risk landscape has changed, (ii) publicly acknowledged, (iii) judged not to "meaningfully increase the overall risk of severe harm," and (iv) maintained at a level "still more protective" than competitors'.[2][15] Critics including TechCrunch and the AI-policy researcher Zvi Mowshowitz characterized the clause as institutionalizing a competitive "race to the bottom" on safety.[15][16]
Aleksander Mądry was reassigned in July 2024 to a research role focused on AI reasoning, after which the Preparedness team was led jointly by Joaquin Quiñonero Candela and Lilian Weng.[7][18] Weng departed OpenAI in November 2024, and Quiñonero Candela transitioned to a different internal role in early 2025; researcher Tejal Patwardhan managed much of the team's day-to-day work during the v2 drafting period.[18][19] At the time of v2's publication the Safety Advisory Group had been operating under the leadership of policy researcher Sandhini Agarwal for approximately two months, according to an OpenAI spokesperson cited by Fortune.[17][19]
The Preparedness Framework was first applied at the public-system-card level to gpt 4o, whose system card was published on 8 August 2024.[20] The Preparedness Scorecard for GPT-4o reported three of the four v1 categories at Low and one (Persuasion) at borderline Medium, driven specifically by textual persuasion of political opinions; the voice modality was assessed as not more persuasive than a human. The overall risk classification for GPT-4o was Medium, below the High threshold that would have barred deployment.[20][21]
o1 was the first OpenAI model to be classified as Medium risk in CBRN in addition to Persuasion, as reported in the o1 System Card dated 12 September 2024 (and an updated December 2024 version).[22][23] The o1 evaluations also involved external red-teaming by apollo research and metr, with Apollo reporting that o1 displayed in-context scheming and strategic-deception behavior at higher rates than prior models. The CBRN classification was driven in particular by uplift on long-form biothreat questions among graduate-level participants.[22][23]
The o3-mini system card (31 January 2025), the Deep Research system card (February 2025) and the Operator system card were the last sets of model-launch documents produced under the v1 framework.[24][25] All three were evaluated against the four v1 categories, with persuasion typically reported as the closest-to-threshold category.[24]
o3 and o4 mini, whose joint system card was published on 16 April 2025, were the first models evaluated end-to-end under the v2 framework. The SAG concluded that neither model reached the High threshold in any of the three v2 tracked categories (Biological & Chemical, Cybersecurity, AI Self-Improvement), allowing deployment without the additional safeguards reserved for High-capability models.[26][27]
The system card for ChatGPT Agent, published on 17 July 2025, marked the first launch of an OpenAI product treated as High capability in the Biological & Chemical domain under v2. OpenAI stated that while it lacked definitive evidence that the model could meaningfully help a novice create severe biological harm (the v2 definition of the High threshold) it had chosen to "take a precautionary approach" and to activate the full set of associated safeguards, including dual-use refusal training, always-on classifiers and reasoning monitors, and enforcement pipelines.[28]
The GPT-5 System Card, published on 13 August 2025, accompanied the launch of the GPT-5 model family. OpenAI classified the reasoning-trained variant gpt-5-thinking as High capability in the Biological & Chemical domain under the v2 framework, activating the Bio/Chem safeguards stack. The system card also introduced new safety-evaluation categories (including deception-monitoring and sandbagging-detection) and reported the results of multi-stakeholder red-teaming including government partners.[29] Subsequent GPT-5.x releases (among them GPT-5.2 in December 2025 and gpt 5 codex in September 2025) were evaluated under the same v2 process, with addenda to the system card capturing variant-specific results.[30]
The Preparedness Framework has been broadly received as one of the more concrete pre-deployment safety policies among frontier labs, while drawing sustained criticism on several axes.
Specificity of thresholds. Researchers including Zvi Mowshowitz and the authors of the SaferAI "AI Lab Watch" project have argued that the v1 and v2 thresholds remain "illustrative" rather than operationalized, that the move from a four-level taxonomy to a two-level High/Critical taxonomy in v2 made low-level capability shifts harder to track, and that the v2 definition of "severe harm" (>1,000 deaths or >USD 100B in damages) sets the bar so high that "Medium-capability" systems could nonetheless enable significant harm.[14][4][17][31]
Removal of Persuasion. The v2 decision to drop Persuasion from both Tracked and Research categories was criticized by Fortune and others, who noted that mass-manipulation risk has been repeatedly cited by U.S. and EU policymakers as a near-term harm and that persuasive AI is precisely the category most likely to corrode the human-oversight assumptions on which the rest of the framework depends.[17][14]
Self-evaluation and SAG transparency. Both v1 and v2 rely on internal evaluations reviewed by an internal SAG whose membership has not been publicly disclosed. Reporting by The Information in mid-2024 and by The Register in late 2025 has emphasized that final go/no-go authority rests with the CEO and that the SAG's role is purely advisory, raising questions about the binding force of the framework's commitments.[32][33]
Competitive-adjustment clause. The v2 provision permitting OpenAI to lower its safeguard requirements if a competitor releases a comparably capable system without comparable safeguards was widely criticized in April 2025 as a "race-to-the-bottom" clause by TechCrunch, Fortune, and the Midas Project's "Watchtower" project.[15][17][34]
Lawmaker references. During the 2024 debate over California's SB 1047 ("Safe and Secure Innovation for Frontier Artificial Intelligence Models Act"), proponents of the bill cited the Preparedness Framework as evidence that frontier labs themselves recognize that catastrophic-risk evaluations were warranted; OpenAI publicly opposed the bill (in a letter signed by chief strategy officer Jason Kwon) on the grounds that frontier-AI regulation should be federal rather than state-level, and the bill was ultimately vetoed by Governor Gavin Newsom in September 2024.[35][36] Mądry, as Head of Preparedness, also submitted written testimony to the U.S. Senate's Schumer "AI Insight Forum" in 2024 in which he described the Preparedness Framework as the operational basis for OpenAI's catastrophic-risk work.[37]
Academic critique. A 2025 working paper by Robin et al., titled "The 2025 OpenAI Preparedness Framework does not guarantee any AI risk mitigation practices: a proof-of-concept for affordance analyses of AI safety policies," argued that the framework's language gives OpenAI a series of discretionary "affordances" (points at which the framework permits but does not require risk-mitigation actions) and concluded that it should not be relied upon as a binding safety commitment in the absence of external enforcement.[31]
Leadership churn. Coverage in TechCrunch, Engadget, and The Register in December 2025 noted that the Head of Preparedness role had been functionally vacant for much of 2025 and that OpenAI was actively recruiting a new senior preparedness lead with a reported compensation range exceeding USD 500,000, framing the recruiting drive as a sign of both the framework's continued centrality and the difficulty of staffing it.[33][38]
The Preparedness Framework is one of three principal pre-deployment safety policies maintained by frontier-AI developers, alongside anthropic's responsible scaling policy (RSP, first published September 2023) and google deepmind's Frontier Safety Framework (FSF, first published May 2024 and updated in 2025).[3][4]
The three documents share a common structural template: dangerous-capability categories, capability or risk thresholds, capability evaluations, and threshold-triggered safeguard commitments. They differ in several respects:[3][4]
Comparative analyses by SaferAI, the Machine Intelligence Research Institute (MIRI), and the Federation of American Scientists have argued that on several dimensions (most notably the explicit halt-development trigger and the 2x evaluation cadence) the OpenAI framework is more concrete than its peers, while on others (most notably the v2 reframing away from "halt training" toward "deploy with safeguards" and the competitive-adjustment clause) it is now less concrete than the post-2024 Anthropic RSP.[4][39][40]