Frontier Safety Framework (Google DeepMind)

AI Safety Google DeepMind

22 min read

Updated Jul 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 23, 2026

Fact-checked

In review queue

Sources

18 citations

Revision

v6 · 4,411 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

The Frontier Safety Framework (FSF) is Google DeepMind's risk-management framework for identifying and mitigating severe risks from advanced frontier AI models, first published on 17 May 2024 and updated to v2.0 in February 2025 and v3.0 in September 2025. DeepMind describes it as "a set of protocols for proactively identifying future AI capabilities that could cause severe harm and putting in place mechanisms to detect and mitigate them."^[1] The framework is built around capability thresholds called Critical Capability Levels (CCLs), defined as "capability levels at which, absent mitigation measures, frontier AI models or systems may pose a heightened risk of severe harm"; when a model approaches a CCL, the FSF triggers a graduated set of security and deployment mitigations.^[1]^[2] It is DeepMind's counterpart to Anthropic's Responsible Scaling Policy and OpenAI's Preparedness Framework, and was published to fulfil the Seoul Frontier AI Safety Commitments that 16 leading developers, including Google, signed in May 2024.^[9]^[10]^[11]

The FSF specifies an evaluation cadence, a tiered scheme of security and deployment mitigations, and governance review mechanisms intended to ensure that mitigations are in place before any model exceeding a CCL is trained, internally deployed, or externally released.^[1]^[2] As of mid-2026 the framework covers four formal misuse and capability domains, chemical, biological, radiological and nuclear (CBRN) information risk, cybersecurity, machine-learning R&D, and harmful manipulation, plus an exploratory approach to misalignment and deceptive alignment.^[5]^[6]

Like its peer documents, the FSF operationalises an "if-then" approach to dangerous-capability risk: stronger mitigations are required as models approach particular capability thresholds. Unlike Anthropic's RSP, however, the FSF was initially framed as a "recommended approach" and an "exploratory" first version rather than a set of binding commitments, a distinction that drew both criticism (for being non-binding) and praise (for explicit acknowledgement of uncertainty and an early focus on machine-learning research and development risks).^[1]^[3]^[11]

A second iteration (v2.0) was released on 4 February 2025, adding CCLs for machine-learning R&D and for deceptive alignment, strengthening the security-level recommendations, and codifying a more rigorous deployment-mitigation process. A third iteration (v3.0) followed on 22 September 2025, which introduced a CCL for harmful manipulation, expanded misalignment protocols, and broadened safety-case review to cover large-scale internal deployments. The FSF has been applied to evaluations of gemini 1.5, 2.0, gemini 2 5 pro, gemini 2 5 flash, and gemini 3.^[2]^[4]^[5]^[7]^[8]

Key facts

Item	Detail
Full name	Frontier Safety Framework
Publisher	google deepmind
Version 1.0 published	17 May 2024^[1]
Version 2.0 published	4 February 2025^[4]
Version 3.0 published	22 September 2025^[5]
Document type	Voluntary corporate risk-management framework
Core construct	Critical Capability Levels (CCLs)
Risk domains (v3)	CBRN, cybersecurity, ML R&D, harmful manipulation; plus exploratory misalignment / deceptive alignment^[5]^[6]
Mitigation tiers	Security levels (2-4) and deployment levels (graduated)^[4]
Initial v1.0 authors / leads	Anca Dragan, Helen King, Allan Dafoe^[1]
v3.0 announcement authors	Four Flynn, Helen King, Anca Dragan^[5]
Peer frameworks	OpenAI preparedness framework; Anthropic responsible scaling policy^[9]^[11]
Application	Gemini 1.5 / 2.0 / 2.5 / 3 evaluations^[7]^[8]

What is the Frontier Safety Framework for?

The FSF exists to give DeepMind a documented, advance procedure for the most severe risks that could arise as frontier models scale: misuse for weapons development, autonomous cyberattacks, runaway acceleration of AI research, large-scale manipulation, and loss of human control over a misaligned system. DeepMind states that the framework "focuses on severe risks resulting from powerful capabilities at the model level, such as exceptional agency or sophisticated cyber capabilities," rather than on ordinary content-moderation or product-safety concerns.^[1]

DeepMind's leadership, including chief executive demis hassabis, publicly warned through 2023 and 2024 that as frontier models scale they may acquire dangerous capabilities, including the ability to assist with the development of weapons of mass destruction, conduct autonomous cyberattacks, or accelerate AI research in ways that outpace governance capacity. These concerns paralleled wider international discussions captured in the bletchley declaration of November 2023, signed by 28 governments and the European Union, and the seoul declaration of May 2024, the latter of which urged frontier developers to publish "Frontier AI Safety Commitments" describing how they would identify and mitigate severe risks.^[10]^[11]

The FSF was announced shortly after the AI Seoul Summit (21-22 May 2024), at which 16 leading developers, including Amazon, Anthropic, Google, IBM, Microsoft, OpenAI, Cohere, Mistral AI, G42, Naver, Samsung, and Zhipu AI, committed to publishing safety frameworks focused on severe risks ahead of the subsequent AI Action Summit in France.^[10]^[11] Internally, DeepMind had been developing dangerous-capability evaluations under its AGI Safety & Alignment team since at least 2022, building on earlier work such as the "Levels of AGI" taxonomy (Morris et al., 2023), which proposed a graded scale for general-purpose AI competence and was frequently cited in subsequent FSF materials as a conceptual underpinning for capability thresholds.^[1]^[12]

DeepMind explicitly positioned the FSF as complementary to Google's pre-existing AI Principles and to "Google's existing suite of AI responsibility and safety practices" rather than as a replacement. The framework was also designed to be revisable: the May 2024 announcement noted that the document was "exploratory" and would "evolve substantially" as understanding of frontier risks improved, with full implementation targeted for early 2025.^[1]

The FSF's intellectual lineage draws on several strands of prior work. Within DeepMind, the AGI Safety & Alignment team had been publishing technical safety research on specification gaming, reward hacking, scalable oversight, and dangerous-capability evaluation methodologies for several years. Externally, the framework's structure echoes the "if-then" logic of biosafety levels in life-sciences laboratories, which classify research according to organism hazard and require correspondingly stronger physical and procedural containment, a parallel made explicit by Anthropic's responsible scaling policy and acknowledged by DeepMind in its discussion of graduated security levels.^[9]^[16] The FSF also reflects the broader policy environment created by the November 2023 UK AI Safety Summit at Bletchley Park, which produced the bletchley declaration signed by 28 governments and the European Union, and by parallel work in the frontier model forum, an industry body co-founded by Google, OpenAI, Anthropic, and Microsoft to share frontier-safety best practices.^[10]^[16]

What was in Version 1.0 (May 2024)?

Version 1.0 of the FSF, published on 17 May 2024 with an accompanying blog post by Anca Dragan, Helen King, and Allan Dafoe, organised the framework around three components:^[1]

Identifying capabilities with severe-harm potential. DeepMind defined Critical Capability Levels as "capability levels at which, absent mitigation measures, frontier AI models or systems may pose a heightened risk of severe harm." CCLs were derived by enumerating foreseeable harm pathways and identifying the minimal set of capabilities required to realise them.
Periodic early-warning evaluations. Models would be re-evaluated each time effective compute increased by approximately a factor of six and after each roughly three-month interval of fine-tuning. "Early warning evaluations" were intended to flag when models approached a CCL with sufficient safety buffer to apply mitigations before the CCL itself was reached.^[1]^[13]
Mitigation plans. When a CCL was approached or reached, DeepMind would apply a tailored combination of security and deployment mitigations. Security mitigations were intended to reduce the risk of model-weight exfiltration; deployment mitigations were intended to constrain or monitor the expression of critical capabilities once a model was used.

Version 1.0 specified four initial risk domains for CCLs:

Autonomy: capabilities that would allow a model to operate persistently without human direction, for example by acquiring resources or replicating across infrastructure.
Biosecurity (often referenced as the "Bio" domain): capabilities that would meaningfully uplift non-expert actors in producing biological threats.
Cybersecurity: capabilities that would substantially increase the ability of threat actors to execute high-impact cyberattacks.
Machine learning R&D: capabilities that could accelerate AI development in ways with destabilising effects.^[1]^[13]

The framework defined a graduated set of security levels and deployment levels keyed to CCL severity. As described in the v1.0 technical report and contemporaneous summaries, "higher level security mitigations result in greater protection against the exfiltration of model weights," while higher deployment levels enable tighter management of critical capabilities through such measures as access controls, safety fine-tuning, classifier-based filters, and monitoring.^[1]^[13]

A widely noted feature of v1.0 was the language used to characterise commitments. DeepMind described the framework as a "set of protocols" and a "recommended approach" rather than a set of contractual commitments. The blog launch acknowledged that "even though these risks are beyond the reach of present-day models, we hope that implementing and improving the framework will help us prepare to address them."^[11]

What changed in Version 2.0 (February 2025)?

Version 2.0 of the FSF was published on 4 February 2025, with an accompanying blog post titled "Updating the Frontier Safety Framework."^[4] Lead authorship on the technical document was attributed to Lewis Ho, Celine Smith, Claudia van der Salm, Joslyn Barnhart, and Rohin Shah, with senior contributions from Allan Dafoe, Anca Dragan, Andy Song, and Demis Hassabis among others.^[4]

Major changes in v2.0 included:

Security level recommendations. Each CCL was paired with a recommended security level (numbered 2 through 4), giving operators a concrete mapping between capability profile and required protection of model weights. DeepMind stated that "this second version of the Framework recommends particularly high security levels for CCLs within the domain of machine learning research and development (R&D)," on the reasoning that uncontrolled proliferation of those capabilities could most directly undermine collective oversight of further AI development.^[4]^[9]
More rigorous deployment-mitigation process. A multi-step procedure was codified for misuse-risk CCLs, including iterative safeguard development, the construction of an explicit safety case, governance review, and post-deployment monitoring. The blog post described applying "a more rigorous safety mitigation process to models reaching a CCL in a misuse risk domain."^[4]
Explicit deceptive-alignment treatment. A new misalignment-oriented capability area was added, with an initial approach that "focuses on detecting when models might develop a baseline instrumental reasoning ability letting them undermine human control unless safeguards are in place," using automated monitoring to detect illicit use of such reasoning.^[4]
Expanded ML R&D CCLs. The ML R&D domain was elaborated into multiple sub-levels addressing both uplift (a model substantially accelerating AI research) and autonomy (a model capable of running portions of the AI R&D pipeline end-to-end). The framework specifically identified the scenario in which a model "fully automates the AI R&D pipeline at a competitive cost" as a salient threshold.^[6]
Updated misuse CCLs. The v2.0 document listed misuse-domain CCLs including "CBRN Uplift 1" (uplift to low-resourced actors for dual-use scientific protocols), "Cyber Uplift Level 1" (uplift for well-resourced threat actors against critical infrastructure), and "Cyber Autonomy Level 1" (end-to-end automation of cyberattacks against limited-security organisations).^[6]

The v2.0 document also reiterated that the framework was a living set of recommendations and that its measures should ideally be adopted across the broader frontier-developer community, noting that "the social value of any single actor's security mitigations will be significantly reduced if not broadly applied across the field."^[4]

What changed in Version 3.0 (September 2025)?

The third major iteration was published on 22 September 2025 with an accompanying blog post, "Google DeepMind strengthens the Frontier Safety Framework," authored by Four Flynn, Helen King, and Anca Dragan, who described it as "our most comprehensive approach yet to identifying and mitigating severe risks from advanced AI models."^[5] Key changes from v2.0 included:

A new CCL for harmful manipulation. Version 3 added a CCL covering AI models "with powerful manipulative capabilities that could be misused to systematically and substantially change beliefs and behaviors in identified high stakes contexts." The blog characterised this as building on years of research into the mechanics of persuasion and described the underlying assessment methodology as "exploratory and subject to further research."^[5]^[14]
Expanded misalignment protocols. The misalignment treatment was widened to cover scenarios in which "misaligned AI models might interfere with operators' ability to direct, modify or shut down their operations," with safeguard procedures designed to maintain operator control over models that might develop instrumental motivations to evade oversight.^[5]^[14]
Broader safety-case review. Version 3 extended the requirement for governance-reviewed safety cases to large-scale internal deployments, not only external releases, recognising that internal applications of frontier models inside Google's own research and product workflows could themselves produce severe risk under some conditions.^[5]
Sharper CCL definitions and risk-assessment process. The document refined CCL definitions to better separate routine operational concerns from threats meeting the "severe harm" criterion and added an explicit risk-acceptability determination step.^[5]

A subsequent update on 17 April 2026 added the concept of Tracked Capability Levels (TCLs), sub-CCL thresholds intended to flag less extreme risks earlier in the development cycle than full CCLs.^[14]

The post-v3 domain set comprises four formal risk areas (CBRN, cybersecurity, ML R&D, and harmful manipulation) plus an exploratory approach for misalignment, including deceptive alignment.^[5]

Critical Capability Levels in detail

Autonomy

In v1.0 the autonomy domain captured capabilities that would let a model operate persistently and acquire resources without human direction. Illustrative thresholds discussed in commentary on the v1.0 document included models able to "self-replicate across rented hardware" or to maintain themselves financially through autonomous activity.^[13] In v2.0 the autonomy concerns were partially absorbed into the misalignment / deceptive-alignment domain and into ML R&D autonomy levels, while in v3.0 they overlap with the new shutdown-resistance protocols.^[4]^[5]

Cybersecurity

The cyber domain in v2.0 distinguished "uplift" levels (additional capability conferred by an AI system to a human attacker) from "autonomy" levels, where the model itself executes an end-to-end attack. "Cyber Uplift Level 1" addressed uplift to well-resourced attackers against critical infrastructure, while "Cyber Autonomy Level 1" addressed full automation of attacks against organisations with limited security.^[6]

CBRN / Biosecurity

Initially framed as a biosecurity domain in v1.0, the misuse-from-scientific-knowledge category was broadened in v2.0 to address chemical, biological, radiological, and nuclear (CBRN) information risks. "CBRN Uplift 1" was defined as the level at which a model can meaningfully assist low-resourced actors with dual-use scientific protocols, a threshold motivated by the concern that diffusion of CBRN knowledge to non-experts is a particularly poorly-bounded class of harm.^[6]

ML R&D

Version 2.0's ML R&D domain was elaborated into uplift and autonomy CCLs. "ML R&D Uplift Level 1" referred to a model that could accelerate AI development substantially, described in commentary on the document as roughly equivalent to compressing the 2020-2024 pace of progress by a factor of two. "ML R&D Autonomy Level 1" denoted a model that could fully automate the AI R&D pipeline at competitive cost.^[6] DeepMind explicitly recommended the highest security level (Level 4) for these CCLs on the reasoning that uncontrolled diffusion of recursive-self-improvement capabilities would be among the most consequential failure modes for the field.^[4]^[9]

Deceptive alignment and misalignment

Version 2.0 introduced a misalignment-oriented capability area structured around Instrumental Reasoning Levels. Instrumental Reasoning Level 1 corresponded to enough situational awareness and instrumental-reasoning capacity that a model could in principle undermine human control unless safeguards were in place; Instrumental Reasoning Level 2 corresponded to capabilities sufficient to evade detection under standard monitoring.^[6] In v3.0 these were complemented by shutdown-resistance considerations.^[5]

Harmful manipulation

Added in v3.0, the harmful manipulation CCL targets models that could, across the course of an interaction, change user beliefs and behaviours in ways that produce expected harm at severe scale in identified high-stakes contexts (for example, large-scale political persuasion or coercive influence). DeepMind described this domain as research-stage and indicated continued work on measurement methodology.^[5]^[14]

How do the FSF's mitigation strategies work?

The FSF's mitigations are grouped into two families:

Security mitigations are designed to prevent unauthorised access to model weights; in particular, to prevent state-level or well-resourced adversaries from exfiltrating a model whose deployment is subject to controls. Security levels are graduated, with Security Level 4 (the highest tier in the v2.0 / v3.0 schema) representing protections approaching state-of-the-art secure-facility controls. ML R&D autonomy CCLs are linked to Security Level 4 in v2.0.^[4]^[9]
Deployment mitigations are designed to constrain or monitor critical capabilities in models that are released either externally or internally. Mitigations include safety fine-tuning to refuse misuse requests, classifier-based filters that detect attempts to elicit dangerous behaviour, structured access patterns, capability shaping to remove dangerous knowledge without degrading general utility, and human oversight of high-risk usage.^[1]^[4]

Version 2.0 added an explicit deployment-mitigation procedure consisting of (i) iterative safeguard development, (ii) a safety case justifying that mitigations adequately reduce residual risk, (iii) governance review by an "appropriate governance function," and (iv) post-deployment monitoring.^[4] Version 3.0 extended the safety-case requirement to large-scale internal deployments.^[5]

How has the FSF been applied to Gemini models?

The FSF is the operating framework under which DeepMind has published model card and dedicated "FSF report" documents accompanying its gemini family of models.

Gemini 1.5: Evaluated under v1.0; the published evaluations focused on autonomy, biosecurity, and cyber uplift early-warning measures.^[15]
Gemini 2.0 / Gemini 2.5: Evaluated under v2.0. Documents accompanying Gemini 2.5 Pro and Gemini 2.5 Deep Think report that "across all areas covered by the Frontier Safety Framework, Critical Capability Levels (CCLs) have not been reached for Gemini 2.5 Pro," though the model crossed the early-warning alert threshold for Cyber Uplift Level 1, prompting DeepMind to note that "subsequent revisions in the next few months could lead to a model that reaches the CCL" and to put its response plan in place.^[7]^[15]
Gemini 3 Pro: Evaluated under v3.0; the November 2025 Gemini 3 Pro FSF Report concluded that Gemini 3 Pro "did not reach any critical capability levels (CCLs) and is deemed acceptable for deployment." The cyber evaluation showed a notable performance jump (passing 11 of 12 "hard" challenges versus 6 of 12 for Gemini 2.5 Pro, with the model finding an unintended shortcut to success in 2 of the 12 cases), CBRN responses provided accurate but generally non-novel information with minimal uplift for less-technical users, ML R&D performance remained "substantially below the alert threshold" (outperforming humans on 2 of 5 tasks versus 0 of 5 for the prior version), and deceptive-alignment evaluations found that the model "exhibits a substantial propensity for strategic deception in certain limited circumstances" while lacking the stealth and situational awareness needed for severe real-world harm.^[8]

How does the FSF compare to peer frameworks, and how was it received?

The FSF is part of a cluster of voluntary frameworks published by leading frontier developers in 2023-2025, including OpenAI's preparedness framework (December 2023; updated 2024-2025) and Anthropic's responsible scaling policy (September 2023; updated multiple times). Comparative analyses by metr, the Frontier Model Forum, and independent researchers have identified both convergences and divergences.^[9]^[16]

Common ground identified across all three frameworks includes:

The use of capability thresholds (called Critical Capability Levels by DeepMind, "High" and "Critical" thresholds by OpenAI, and AI Safety Levels by Anthropic) to gate stronger safeguards.
Coverage of broadly the same risk domains: biological/CBRN misuse, cyberoffense, and AI R&D acceleration.
Pre-deployment dangerous-capability evaluations and red-team testing of mitigations.
Periodic re-evaluation tied to capability scaling.^[9]^[16]

Distinctive features of the FSF, compared with its peers, include:

Explicit inclusion of misalignment and deceptive alignment as formal threat models from v2.0 onward, a feature that several reviewers noted goes further than the OpenAI framework's initial scope.^[9]
Harmful manipulation added as a CCL in v3.0, before the same domain was formalised by some peer frameworks.^[5]^[16]
Graduated security levels recommended at the level of the entire field, reflecting DeepMind's view that uniform diffusion is required for security measures to be socially useful.^[4]
A safety-case procedure that, by v3.0, applies both to external deployments and to large-scale internal use.^[5]

Criticism of the FSF has paralleled criticism of the OpenAI and Anthropic frameworks. Independent reviewers, including writers on the EA Forum, the Machine Intelligence Research Institute, and various commentators, have argued that:

The framework's language of "recommended approach" and its qualified commitments, particularly in v1.0, fall short of binding constraints, and that the framework is therefore more transparency mechanism than enforcement mechanism.^[11]^[17]
Mitigation thresholds remain under-specified relative to the seriousness of the risks they aim to address, and that current jailbreaking and elicitation techniques routinely defeat deployment-stage safeguards within days of model release.^[17]
The frameworks broadly underweight the possibility that important threat models have been missed: in contrast to mature safety-critical industries, frontier AI risk management generally lacks a structural assumption that something important has been overlooked.^[17]

Supportive responses have emphasised the FSF's relative early coverage of ML R&D risks and deceptive alignment, its explicit articulation of governance-reviewed safety cases, and the published Gemini FSF reports as concrete artefacts demonstrating that the framework is being operated, not merely declared.^[9]^[16]

External regulatory developments have also shaped the FSF's significance. The eu ai act (entered into force 2024) imposes systemic-risk obligations on general-purpose AI models above defined compute thresholds, and California's SB 53, the Transparency in Frontier Artificial Intelligence Act signed on 29 September 2025 and effective 1 January 2026, creates statutory obligations for large developers of models trained above 10^26 floating-point operations to publish, implement, and comply with frontier-AI safety frameworks. Commentators have noted that voluntary frameworks such as the FSF are increasingly intersecting with, and may be partially absorbed by, this regulatory layer.^[9]^[18]

How is the FSF governed?

The FSF embeds its mitigation choices in a defined governance process. From v2.0 onward, the framework specifies that decisions about whether mitigations are sufficient, and therefore whether a model should be trained further, deployed externally, or deployed internally at scale, must pass through an "appropriate governance function" within Google DeepMind, which the document does not name individually but describes in terms of role and authority. The governance function reviews:

The dangerous-capability evaluation results.
The associated safety case justifying that mitigations reduce residual risk to acceptable levels.
Plans for ongoing post-deployment monitoring.^[4]^[5]

When evaluations indicate that a model is approaching or has crossed an early-warning threshold for a CCL, the framework requires a structured response plan. The Gemini 2.5 Pro materials describe such a response, including higher-frequency follow-up testing and additional mitigations, in concrete terms following observed approach to the Cyber Uplift Level 1 and CBRN Uplift Level 1 thresholds.^[7] The FSF also envisages collaboration with external evaluators, who in the case of Gemini 3 Pro provided independent CBRN wet-lab testing and additional adversarial red-team probing of deceptive-alignment behaviour.^[8]

DeepMind has framed the publication of FSF reports alongside model card documents as the principal transparency mechanism. Each FSF report summarises which CCLs were evaluated, which evaluation suites were used, how results compared against early-warning and alert thresholds, and what mitigations were applied. Together with the system-card series, this approach is intended to give external researchers, governments, and other developers a documented basis for assessing the safety claims associated with each model.^[7]^[8]

How does the FSF relate to Anthropic's RSP and OpenAI's Preparedness Framework?

Comparison with peer frameworks helps clarify both the FSF's contribution and its limitations.

anthropic's responsible scaling policy (first published September 2023) uses a tiered scheme of AI Safety Levels (ASL-1 through ASL-4 and beyond), patterned on biosafety levels. ASL-3 introduces requirements such as a "commitment not to deploy if catastrophic misuse risk is evident under adversarial testing," a defence-in-depth deployment standard, and a security standard validated by independent third-party audit. The RSP has been updated repeatedly and is administered through a Responsible Scaling Officer reporting to Anthropic's CEO and board.^[9]

openai's preparedness framework (first published December 2023) defines Tracked Risk Categories, initially biological, cybersecurity, autonomous replication, and AI self-improvement, and rates models against "Low," "Medium," "High," and "Critical" capability thresholds in each category. The framework commits OpenAI to not deploy models that reach Critical capability levels without strong mitigations, governance review by a Safety Advisory Group, and explicit board-level escalation paths for safety decisions.^[9]

The FSF's central construct, the Critical Capability Level, sits somewhere between Anthropic's biosafety-style levels and OpenAI's tracked-category thresholds. Like OpenAI's framework, it organises CCLs by domain (CBRN, cyber, ML R&D, manipulation, misalignment) rather than by a single global level. Like Anthropic's RSP, it pairs each capability threshold with prescribed security and deployment mitigations. Distinctively, it formalises ML R&D as a top-tier risk requiring Security Level 4 mitigations from v2.0, includes deceptive-alignment as a formal capability area, and from v3.0 includes harmful manipulation as a CCL.^[4]^[5]^[9]

Despite these structural differences, independent reviewers including METR and the Frontier Model Forum have judged the three frameworks to be substantially convergent in their underlying logic: each pre-commits to specific kinds of evaluation, defines (with varying precision) the capability thresholds that would trigger stronger safeguards, and creates a documented process for deciding whether a model should be deployed. The principal practical question is one of stringency: how much capability is enough to trigger which mitigation, and of binding force, since all three frameworks remain voluntary corporate commitments rather than legally enforceable instruments in most jurisdictions.^[9]^[16]

References

Dragan, Anca; King, Helen; Dafoe, Allan. "Introducing the Frontier Safety Framework." Google DeepMind blog, 17 May 2024. https://deepmind.google/blog/introducing-the-frontier-safety-framework/ ↩
Google DeepMind. "Frontier Safety Framework Version 1.0" (technical report PDF). 17 May 2024. https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/introducing-the-frontier-safety-framework/fsf-technical-report.pdf ↩
"Google DeepMind Frontier Safety Framework Version 1.0." ETO AGORA AI policy instrument record. https://agora.eto.tech/instrument/987 ↩
Google DeepMind. "Updating the Frontier Safety Framework." Blog, 4 February 2025. https://deepmind.google/blog/updating-the-frontier-safety-framework/ ↩
Google DeepMind. "Google DeepMind strengthens the Frontier Safety Framework." Blog, 22 September 2025. https://deepmind.google/blog/strengthening-our-frontier-safety-framework/ ↩
Google DeepMind. "Frontier Safety Framework Version 2.0" (PDF). 4 February 2025. https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/updating-the-frontier-safety-framework/Frontier%20Safety%20Framework%202.0%20(1).pdf ↩
Google DeepMind. "Gemini 2.5 Pro Model Card" (PDF). Updated 27 June 2025. https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Pro-Model-Card.pdf ↩
Google DeepMind. "Gemini 3 Pro Frontier Safety Framework Report" (PDF). November 2025. https://storage.googleapis.com/deepmind-media/gemini/gemini_3_pro_fsf_report.pdf ↩
METR. "Common Elements of Frontier AI Safety Policies." https://metr.org/common-elements ↩
UK Government. "Frontier AI Safety Commitments, AI Seoul Summit 2024." May 2024. https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024 ↩
Reed, Reed (Semafor). "Google DeepMind launches new framework to assess the dangers of AI models." Semafor, 17 May 2024. https://www.semafor.com/article/05/17/2024/google-deepmind-launches-new-framework-to-assess-the-dangers-of-ai-models ↩
Morris, Meredith Ringel; et al. "Levels of AGI: Operationalizing Progress on the Path to AGI." arXiv preprint, November 2023. https://arxiv.org/abs/2311.02462 ↩
Google DeepMind Frontier Safety Framework v1.0 summary, ETO AGORA. https://agora.eto.tech/instrument/987 ↩
AI Safety Directory. "Google DeepMind Frontier Safety Framework (International, 2026)." https://aisecurityandsafety.org/en/frameworks/google-deepmind-frontier-safety-framework/ ↩
Google DeepMind. "Frontier Safety Framework Version 3.0" (PDF). 22 September 2025. https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/strengthening-our-frontier-safety-framework/frontier-safety-framework_3.pdf ↩
Frontier Model Forum. "Issue Brief: Components of Frontier AI Safety Frameworks." https://www.frontiermodelforum.org/updates/issue-brief-components-of-frontier-ai-safety-frameworks/ ↩
Machine Intelligence Research Institute. "Existing Safety Frameworks Imply Unreasonable Confidence." 9 April 2025. https://intelligence.org/2025/04/09/existing-safety-frameworks-imply-unreasonable-confidence/ ↩
California Legislature. "SB-53 Artificial intelligence models: large developers (Transparency in Frontier Artificial Intelligence Act)." Signed 29 September 2025. https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=202520260SB53 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

5 revisions by 1 contributor · full history

Suggest edit

What links here

AI Seoul Summit Bletchley Declaration Cybench Frontier Model Forum Gemini (language model)MLE-bench Preparedness Framework (OpenAI)Responsible Scaling Policy Sandbagging (artificial intelligence)Seoul Declaration Shane Legg Superalignment

Key facts

What is the Frontier Safety Framework for?

What was in Version 1.0 (May 2024)?

What changed in Version 2.0 (February 2025)?

What changed in Version 3.0 (September 2025)?

Critical Capability Levels in detail

Autonomy

Cybersecurity

CBRN / Biosecurity

ML R&D

Deceptive alignment and misalignment

Harmful manipulation

How do the FSF's mitigation strategies work?

How has the FSF been applied to Gemini models?

How does the FSF compare to peer frameworks, and how was it received?

How is the FSF governed?

How does the FSF relate to Anthropic's RSP and OpenAI's Preparedness Framework?

References

Improve this article

Related Articles

SynthID

DQN

ERQA

PaLM-E: An Embodied Multimodal Language Model

SmolVLA

Gemini (language model)

What links here

Related Articles

SynthID

DQN

ERQA

PaLM-E: An Embodied Multimodal Language Model

SmolVLA

Gemini (language model)

What links here