UK AI Security Institute

The UK AI Security Institute (AISI) is a research organization within the United Kingdom's Department for Science, Innovation and Technology (DSIT) that conducts pre-deployment evaluations of frontier AI models and produces safety and security research to inform government policy. It is widely regarded as the world's first government body dedicated to the systematic technical evaluation of advanced AI systems.

Originally established as the UK AI Safety Institute on 2 November 2023, following Prime Minister Rishi Sunak's announcement at the AI Safety Summit held at Bletchley Park, the organization was renamed the UK AI Security Institute on 14 February 2025, when Technology Secretary Peter Kyle announced a strategic refocusing toward serious AI risks with national security implications. The institute's operational home page is at aisi.gov.uk, and it retains the AISI abbreviation under the new name.

As of 2026, the institute employs more than 100 technical staff drawn from industry, academia, and civil society organizations. It operates with annual funding of approximately £66 million, has access to more than £1.5 billion in computing resources through the national AI Research Resource, and has conducted pre-deployment evaluations on more than 30 state-of-the-art AI models since its founding. The institute chairs an international network of AI safety and security institutes spanning eleven countries and the European Union.

Background: the Frontier AI Taskforce

The institution that became the UK AI Security Institute traces its direct origins to April 2023, when the UK government announced the formation of the Frontier AI Taskforce. At the time, the UK government committed an initial £100 million to fund the organization. The stated purpose of the Taskforce was to understand the capabilities and risks of frontier AI systems before more comprehensive regulatory structures could be established.

Ian Hogarth, a technology investor and entrepreneur best known for his 2023 essay "We must slow down the race to God-like AI," was appointed to chair the Taskforce. Hogarth had argued publicly that the most advanced AI systems posed unique societal risks and that independent, government-backed evaluation was necessary to prevent AI developers from, as he put it, marking their own homework.

The Frontier AI Taskforce began recruiting technical staff during mid-2023 and started building evaluation frameworks and relationships with leading AI developers. During this period, Prime Minister Rishi Sunak secured informal commitments from Sam Altman of OpenAI, Demis Hassabis of Google DeepMind, and Dario Amodei of Anthropic to provide early access to their forthcoming models for safety testing. These voluntary commitments preceded any formal agreements and reflected the close dialogue the UK government had cultivated with AI company leadership during 2023.

The AI Safety Summit and AISI launch

The AI Safety Summit, held at Bletchley Park on 1 and 2 November 2023, was the first major intergovernmental conference focused specifically on the risks of frontier AI. The choice of Bletchley Park was deliberate: the site is famous for codebreaking work during the Second World War and holds symbolic resonance in British technology history.

The summit brought together delegations from 28 countries, including the United States, China, the European Union, Australia, Canada, France, Germany, India, Italy, Japan, and South Korea. On the first day of the summit, representatives from these countries and the EU signed the Bletchley Declaration, a joint statement affirming that AI should be developed and deployed in a manner that is safe, human-centric, trustworthy, and responsible. The Declaration acknowledged that frontier AI carries risks of intentional misuse, unintended loss of control, and potential for catastrophic or irreversible harm, and committed signatory nations to work together to understand those risks.

Critically for the AISI, the Bletchley Declaration also called for government-led evaluation of frontier models and endorsed the principle that AI companies should submit their most capable systems for independent testing before public release. Eight major AI developers, including Anthropic, Google DeepMind, OpenAI, Meta, Microsoft, Amazon Web Services, Mistral AI, and Inflection AI, signed a separate commitment to support pre-deployment evaluations of their forthcoming models.

On 2 November 2023, in his closing address to the summit, Prime Minister Rishi Sunak announced the formal launch of the UK AI Safety Institute as the successor to the Frontier AI Taskforce. Ian Hogarth remained in the Chair role. The AISI was explicitly designed, as the official launch document described, to be "like a startup in government": combining governmental authority and access to official information with the technical depth and operational speed more typical of a well-funded technology organization. The institute was positioned not as a regulator (it had no enforcement powers) but as an evaluation and research body whose findings would inform both UK policy and the broader international debate on AI governance.

Mission and core functions

The institute's mission, as stated on its official website, is to build "the world's leading understanding of advanced AI risks and solutions, to inform governments so they can keep the public safe."

Since its founding, the AISI has organized its activities around three interconnected functions.

Testing AI systems. The institute conducts pre-release evaluations of leading AI models in collaboration with the companies that develop them. These evaluations examine how capable a model is in domains that pose safety or security concerns, whether the model's safeguards against misuse hold up against adversarial attempts to circumvent them, and whether autonomous or agentic uses of the model could produce unintended harmful outcomes.

Research advancement. The institute conducts and funds in-house research on AI safety and security topics that it considers underdeveloped in the academic literature. This includes work on evaluation methodology, model alignment, control mechanisms for autonomous AI systems, and understanding how AI capabilities in sensitive domains are evolving over time. As of 2026, the institute has distributed more than £15 million in research grants to external academic groups.

Policy guidance. The institute provides technical briefings, reports, and advice to UK government departments and to allied governments participating in the international network. Its findings on model capabilities, safeguard robustness, and emerging risks feed directly into policy discussions about how to regulate and deploy frontier AI systems.

Leadership

Ian Hogarth served as Chair of the organization from its inception as the Frontier AI Taskforce through at least 2025. Hogarth brought to the role both his public profile as an advocate for cautious AI development and his experience as a founder and investor in the technology sector. His appointment was seen as a signal that the UK government intended the institute to have genuine technical credibility and independence from narrow industry interests.

Geoffrey Irving serves as Chief Scientist. Irving spent eight years in machine learning research, during which he co-led the N2Formal neural network theorem proving team at Google Brain, led the Reflection Team at OpenAI working on AGI safety and language model alignment, and subsequently led the Scalable Alignment Team at Google DeepMind. He is best known academically for developing the "debate" approach to aligning AI systems with human values, a scalable alignment technique in which two AI systems argue opposing positions and human judges evaluate the quality of their reasoning.

Chris Summerfield serves as Research Director. Summerfield is a Professor of Cognitive Neuroscience at the University of Oxford and brings expertise in how biological intelligence relates to artificial systems. His academic background informs the institute's approaches to evaluating AI reasoning, persuasion, and influence capabilities.

Jade Leung serves as Chief Technology Officer and also holds the role of Prime Minister's AI Adviser. Leung previously led Governance at OpenAI and brings deep expertise in the policy and operational dimensions of responsible AI deployment.

Adam Beaumont was appointed Interim Director in October 2025, succeeding Oliver Ilott in the operational leadership role. Beaumont previously served as Chief AI Officer at GCHQ, the UK government's intelligence and cybersecurity agency. His appointment reflected the institute's increased focus on national security applications following the February 2025 renaming, and it deepened the institute's working relationship with the broader UK security and intelligence community.

The institute's advisory board includes Yoshua Bengio, one of the founding researchers of modern deep learning and a prominent advocate for cautious AI governance. Bengio's role on the board was reinforced by his selection to lead the International AI Safety Report 2026, an effort coordinated with the UK government and AISI in the run-up to the February 2026 publication.

Pre-deployment evaluations

The pre-deployment evaluation program is the institute's most operationally distinctive activity. It involves the institute receiving access to an AI model before the model is released publicly, conducting structured tests to assess its capabilities and vulnerabilities, and producing findings that inform both the developing company and the UK government.

The original voluntary commitments made at and around the Bletchley Summit were subsequently formalized through memoranda of understanding with major AI developers. The institute has signed such agreements with Anthropic, OpenAI, Google DeepMind, and Cohere, among others. These agreements give the institute early access to models under defined conditions and establish expectations for information sharing.

The institute evaluates models anonymously in published reports: findings describe model performance in terms of anonymized labels (Red, Purple, Green, Blue, Yellow) rather than naming specific products, a practice designed to focus public attention on systemic patterns rather than competitive rankings between companies.

Methodology

Evaluations are structured around four primary risk domains: cybersecurity, chemical and biological misuse, autonomous AI behavior, and safeguard robustness.

Within each domain, the institute tests three dimensions of model behavior. Compliance measures whether the model declines to assist with requests that fall within the risk category. Correctness assesses the accuracy and practical utility of any assistance the model provides. Completion measures how successfully the model can carry a potentially harmful task through to a functional result.

Evaluators use both automated methods and expert human review. Automated evaluation allows systematic testing at scale across large numbers of prompts and scenarios. Human expert validation is used for more complex cases, particularly in domains like cybersecurity and biology, where assessing whether model output actually provides meaningful uplift to a malicious actor requires specialist judgment.

The institute also employs adversarial red-teaming to assess safeguards. Rather than accepting model refusals at face value, red-teamers attempt to circumvent the safeguard through jailbreak techniques, indirect prompting, or by exploiting gaps between what a model is trained to decline and what it can be induced to do with creative framing. A finding that a model's safety filter fails under adversarial conditions is treated as a more significant signal than raw compliance rates on straightforward harmful requests.

Key findings

The institute's May 2024 advanced evaluations report, the first major published assessment covering multiple models tested before their release, found that the most capable models evaluated could solve more than half of capture-the-flag cybersecurity challenges designed for high school students but struggled with university-level challenges. In the life sciences domain, several models matched PhD-level performance on expert-written questions covering basic biology, advanced biological topics, and laboratory automation.

On autonomous capabilities, models evaluated in 2024 completed 20 to 40 percent of short-horizon software engineering tasks but failed all long-horizon tasks, hampered by syntax errors and insufficient self-testing. The institute characterized this as indicating a meaningful but not yet alarming level of autonomous capability.

All models evaluated in the 2024 report remained vulnerable to basic jailbreaks, with compliance rates on harmful requests reaching 28 percent even without adversarial prompting. The institute noted this as a significant baseline concern given that these same models would be deployed to millions of users.

Frontier AI Trends Report

In 2025, the institute published its inaugural Frontier AI Trends Report, drawing on two years of pre-deployment evaluations to characterize the trajectory of AI capability development across the six primary domains it monitors: chemistry and biology, cybersecurity, autonomous capabilities, safeguard robustness, loss-of-control risks, and societal impact.

The report's headline findings included a doubling of performance in some capability domains approximately every eight months. On cybersecurity benchmarks, models that completed apprentice-level tasks nine percent of the time in late 2023 were completing the same tasks fifty percent of the time by early 2025. In the life sciences domain, frontier models had surpassed the performance of human PhD experts on open-ended biology questions, and were generating detailed experimental protocols tailored to the recipient's level of knowledge.

Safeguard analysis in the Trends Report found that universal jailbreaks (techniques that cause every tested system to comply with harmful requests) existed in every system the institute had evaluated. However, the report also found that the amount of expert time required to discover effective jailbreaks was increasing for certain models and request categories, indicating that safeguard robustness was improving even if absolute robustness remained elusive.

On autonomous AI and loss-of-control risks, the report documented that self-replication success rates for models attempting to copy themselves across compute environments had increased from approximately five percent in 2023 to sixty percent by 2025. The institute reported no evidence of spontaneous sandbagging (strategic underperformance by models in evaluation settings) but noted it had developed specific detection methods for this behavior given its implications for the reliability of safety evaluations.

Claude Mythos evaluation (2026)

Among the institute's most significant published evaluations was its 2026 assessment of Anthropic's Claude Mythos Preview model. The evaluation focused on cybersecurity capabilities and found that Claude Mythos Preview was the first model tested that could complete expert-level capture-the-flag challenges seventy-three percent of the time. More significantly, it was the first model to complete "The Last Ones" simulation from start to finish in three out of ten attempts. The Last Ones is a 32-step simulation of a corporate network attack estimated to require twenty hours of effort by a human expert with more than a decade of experience.

The institute concluded that Claude Mythos Preview could autonomously attack small, weakly defended, and vulnerable enterprise systems when given network access. It recommended that organizations prioritize cybersecurity fundamentals such as security updates, multi-factor authentication, and robust access controls, while noting that its test environments lacked active defenders and real-world security tooling, meaning results might not directly extrapolate to hardened systems. Anthropic separately stated that the Mythos preview had been used internally to discover thousands of high-severity vulnerabilities, including issues affecting every major operating system and several mainstream web browsers, and the institute's report described the model as one of the most consequential dual-use systems it had evaluated.

Fraud and cybercrime evaluation framework (February 2026)

In February 2026, the institute published an evaluation framework specifically targeting AI misuse in fraud and cybercrime, addressing what it described as a rapidly growing real-world threat. Background incidents cited by the institute included criminals using large language models to secure remote employment under fabricated identities, to conduct detailed victim profiling, and to draft adaptive phishing campaigns at industrial scale.

The framework's methodological novelty was a battery of long-form tasks (LFTs) that mirror how fraud and cybercrime unfold in practice. Rather than scoring a model on isolated harmful prompts, LFTs simulate the multi-step interaction patterns that real malicious actors use: identifying a target, gathering open-source intelligence, drafting an initial outreach message, escalating to credential capture, and exfiltrating value from compromised accounts. The institute argued that single-shot prompt evaluations had systematically underestimated misuse risk because they failed to capture the operational fluency required for end-to-end criminal workflows.

The report's headline finding was that the single strongest predictor of whether a model provided meaningful fraud assistance was not parameter count, training data volume, or generic benchmark performance, but whether the model retained a functioning safety alignment layer. Closed-weight, safety-aligned models accessed through commercial APIs consistently refused harmful requests across the LFT suite. Misuse risk concentrated overwhelmingly in open-weight models that had been fine-tuned to strip their original guardrails, a category the institute described as growing rapidly and poorly governed by existing voluntary frameworks.

The framework was developed jointly with the new criminal misuse team established as part of the February 2025 renaming, in collaboration with the Home Office, and is expected to inform subsequent UK policy on the regulation of unrestricted open-weight model derivatives.

The Inspect evaluation framework

In May 2024, the institute released Inspect, an open-source Python framework for conducting large language model evaluations. The release made public the core software infrastructure underlying the institute's own evaluation practice and positioned the institute as a contributor to the broader ecosystem of AI evaluation tooling.

Inspect is installable via a single pip command and is released under the MIT license, making it freely available for commercial and non-commercial use. The framework supports evaluations covering reasoning, coding, knowledge assessment, agentic task completion, behavior analysis, and multi-modal understanding. It includes a collection of more than 200 pre-built evaluations that can be run against any supported model.

The framework provides a web-based visualization tool called Inspect View for monitoring and reviewing evaluation runs, as well as a VS Code extension for authoring and debugging evaluations. It supports flexible tool-calling capabilities including custom tools, MCP tools, and built-in utilities for bash execution, code interpretation, web search, and web browsing.

The institute subsequently expanded the Inspect ecosystem with InspectSandbox, a secure execution environment for agentic evaluations; InspectCyber, a specialized collection of cybersecurity evaluation tasks; and ControlArena, a library specifically designed for AI control experiments. By early 2026 the Inspect framework had become the de facto common substrate for several national institutes and a number of frontier labs, with the institute reporting that more than half of the joint evaluations conducted across the international network were authored against Inspect interfaces.

Inspect component	Function	First public release
Inspect core	Evaluation orchestration and scoring	May 2024
Inspect View	Web visualization for evaluation runs	May 2024
InspectSandbox	Secure execution environment for agentic evaluations	2024
InspectCyber	Cybersecurity-focused evaluation collection	2025
ControlArena	AI control experiment library	2025

ControlArena

ControlArena, released in 2025, is an open-source research platform for studying whether safety protocols can prevent advanced AI models from taking actions that circumvent human oversight. It provides standardized environments in which researchers can test specific control techniques against AI systems modeled on frontier developer infrastructure.

ControlArena's default scenarios test four threat models relevant to the institute's alignment research: weight exfiltration (whether an AI model can copy its own parameters to an external location), secret exfiltration (whether a model can leak confidential information), rogue internal deployment (whether a model can spin up new instances of itself), and sabotage of safety research (whether a model can subtly corrupt the evaluation process it is being subjected to).

The platform has been used by researchers at Anthropic, Google DeepMind, Redwood Research, and other organizations. Its release reflects the institute's broader strategy of producing public goods for the AI safety research community rather than keeping evaluation infrastructure proprietary.

Research agenda

The institute organizes its research program into two broad categories: risk research, which characterizes the severity and trajectory of AI-related threats, and solutions research, which develops and tests approaches to mitigating those threats.

Risk research

Domain-specific risk research covers six areas: cyber misuse, dual-use science (particularly in chemistry and biology), criminal misuse (including AI-enabled fraud and child sexual abuse material), autonomous AI systems (including loss-of-control and unintended harmful action), societal resilience (risks from widespread AI deployment affecting critical infrastructure or social cohesion), and human influence and manipulation.

In parallel, the institute conducts generalized research on the science of evaluations, developing more rigorous statistical and methodological frameworks for measuring AI capabilities, and on capability elicitation, the study of how to accurately surface the true performance ceiling of a model rather than its typical behavior under standard prompting.

Solutions research

Safeguard analysis examines whether defensive measures deployed by AI developers actually hold up under adversarial conditions. The institute uses adversarial machine learning techniques to probe whether model safeguards can be circumvented and characterizes the difficulty of finding effective jailbreaks as a proxy for safeguard robustness.

Control research develops protocols for preventing AI models from taking actions that circumvent human oversight even in the context of advanced AI systems that may have strategic capabilities. The institute uses mock environments replicating frontier developer infrastructure to test whether proposed control protocols would succeed against capable adversarial models.

Alignment research pursues approaches to ensuring AI systems behave in accordance with human values and intentions, including work on detecting deceptive AI behavior and establishing formal guarantees around model honesty. The institute's £15 million Alignment Project, launched in 2025, is one of the largest dedicated alignment research efforts globally.

The institute also conducts empirical persuasion research. In 2025, it published a large-scale study involving more than 76,000 participants examining the mechanisms through which AI-generated content influences human attitudes, a study the institute framed as groundwork for understanding AI-driven manipulation risks at societal scale.

Renamed UK AI Security Institute (February 2025)

On 14 February 2025, Technology Secretary Peter Kyle announced at the Munich Security Conference that the UK AI Safety Institute would be renamed the UK AI Security Institute. The announcement came three days after the AI Action Summit in Paris, which had convened under French presidency to advance the post-Bletchley agenda on AI governance.

Kyle framed the renaming as an alignment of the institute's title with its actual activities. He described the change as reflecting the government's view that the most pressing AI risks were those with direct security implications: cyber attacks, the potential for AI to assist in the development of chemical or biological weapons, criminal misuse including fraud and child sexual exploitation material, and risks to critical national infrastructure.

The renaming was accompanied by structural changes. A new criminal misuse team was created within the institute, partnering with the Home Office to research AI-enabled crime. The institute was tasked with playing a leading role in testing AI models from any developer, open or closed, that posed security-relevant risks. A new agreement with Anthropic was announced, described as part of a broader Sovereign AI initiative.

The February 2025 announcement explicitly stated that the institute would no longer prioritize work on AI ethical issues such as algorithmic bias or freedom of expression in AI applications, marking a deliberate narrowing of scope compared to the institute's original mandate.

In the same announcement, the institute confirmed an expanded set of working relationships with other parts of the UK security architecture. Deepened collaboration with the Laboratory for AI Security Research (LASR) and the National Cyber Security Centre (NCSC) was framed as enabling the institute to draw on existing classified threat intelligence and offensive cyber expertise when designing evaluations relevant to national security. A written ministerial statement on 24 February 2025 clarified that the institute's evaluation work would inform the security clearance and risk-management decisions taken by other parts of government when adopting AI systems in defence and intelligence contexts.

Reactions to the renaming

The renaming generated significant commentary from AI governance researchers and civil society organizations.

The AI Now Institute issued a statement warning that the shift from safety to security framing risked applying "piecemeal or superficial scrutiny" to AI systems before they were ready for deployment in defense and national security applications. AI Now argued that frontier AI models carry cyber vulnerabilities that themselves pose national security risks, and that independent safety evaluation must be insulated from industry partnerships to retain credibility.

Other observers interpreted the renaming more pragmatically. Some analysts noted that the institute's actual evaluation work had always included substantial security-relevant content. Its cybersecurity evaluations, CBRN misuse assessments, and safeguard robustness testing all had direct security applications, and the name change formalized an existing emphasis rather than fundamentally redirecting the organization.

The IAPS (Institute for AI Policy and Strategy) noted in a broader analysis of AI safety institutes that the renaming marked "a pivot between two competing visions for AI governance: one that emphasizes long-term risk mitigation and public accountability, and the other prioritizing innovation, speed, and global competitiveness."

Parliamentary debate on the AI Security Institute took place on 24 February 2025, with members questioning whether the narrowed remit would leave important categories of AI risk, particularly those relating to bias, transparency, and impacts on democratic processes, without a dedicated institutional home.

Government and industry agreements

The institute's ability to conduct pre-deployment evaluations depends on voluntary agreements with AI developers, since the UK has not yet enacted legislation requiring mandatory pre-deployment testing.

Anthropic. The AISI and Anthropic have maintained one of the most extensively documented evaluation partnerships. This relationship produced the institute's assessments of Claude 3.5 Sonnet, which was shared with UK AISI and METR before public release, and the 2026 Claude Mythos Preview evaluation. Anthropic and the AISI also collaborated on biosecurity red-teaming that identified dozens of vulnerabilities including new universal jailbreak paths, and on the largest backdoor data poisoning study conducted to date. A formal MOU between the two organizations was announced alongside the February 2025 renaming.

OpenAI. The UK AISI and the US AI Safety Institute (housed at NIST) conducted a joint pre-deployment evaluation of OpenAI's o1 model, the first major joint evaluation between the two institutes. This reflected both the maturation of the bilateral US-UK AI safety relationship and the specific interest both governments had in the capabilities of advanced reasoning models.

Google DeepMind. Google DeepMind has partnered with the AISI since the institute's inception in November 2023. The partnership deepened through a formal research partnership agreement announced in 2025, with DeepMind also being an active user of ControlArena for internal control research.

Microsoft. In May 2026, Microsoft signed parallel agreements with the UK AISI and the US Center for AI Standards and Innovation (CAISI), the renamed successor to the US AI Safety Institute, covering pre-deployment evaluation, joint methodology development, and information sharing on national-security-relevant capabilities. Microsoft framed the agreements as committing the company to provide both institutes with structured access to frontier Microsoft and OpenAI-derived systems running on Azure, and to participate in joint exercises against shared benchmarks. The Microsoft agreement followed similar 2026 CAISI announcements covering DeepMind and xAI, signaling the consolidation of pre-deployment evaluation as a standard expectation for frontier developers operating in either jurisdiction.

Cohere and others. The institute has signed MOUs with Cohere and has conducted evaluations of models from additional developers under terms that allow the institute to publish findings but typically not to publicly identify the specific model or developer without agreement.

Developer	Agreement type	First evaluation	Notable joint work
Anthropic	MOU (formalized Feb 2025)	Claude 3.5 Sonnet	Biosecurity red-teaming, Claude Mythos Preview, data poisoning study
OpenAI	Voluntary commitment	o1 (joint with US institute)	Reasoning model assessment
Google DeepMind	Formal research partnership (2025)	Since Nov 2023	ControlArena adoption, ongoing model access
Microsoft	Bilateral agreement (May 2026)	2026 onward	Joint UK-US testing of Azure frontier systems
Cohere	MOU	Various	Model-specific evaluations

The Inspect Agent Red-Teaming Challenge

In early 2025, the institute partnered with Gray Swan AI to run the UK AISI Agent Red-Teaming Challenge, described at the time as the largest public evaluation of agentic large language model safety conducted to date. The challenge ran from 8 March to 6 April 2025 and invited external researchers to attempt to find safety failures in deployed agentic AI systems.

Participants made approximately 1.8 million attempts to break agent behaviors, resulting in 62,000 successful exploits across 22 different large language models targeting 44 specific categories of harmful agentic behavior. The results provided the institute with a large dataset on the types of agent safety failures that adversarial users could reliably produce and informed subsequent research on agentic AI safeguards.

International counterparts and the global network

The UK AI Safety Institute was the first national government body of its kind when it launched in November 2023. Within eighteen months, analogous institutions had been established or were being established across more than a dozen countries, and the UK institute had taken a central role in coordinating their activities.

US Center for AI Standards and Innovation (CAISI). The United States established its AI Safety Institute within the National Institute of Standards and Technology (NIST) in early 2024. In 2025 the body was restructured and renamed the Center for AI Standards and Innovation, with a remit that placed greater emphasis on measurement, evaluation infrastructure, and standards-setting than the predecessor institute. CAISI inherited the bilateral relationship with the UK AISI, including the joint OpenAI o1 evaluation, and remained the UK institute's closest international counterpart through to 2026.

Japan AI Safety Institute. Japan established its AI Safety Institute in February 2024 under the Information Technology Promotion Agency, with a staff of approximately 23 people.

Singapore AISI. In May 2024, Singapore renamed its Digital Trust Centre as the Singapore AI Safety Institute, placing it under Nanyang Technological University with S$10 million in annual funding.

South Korea. South Korea announced in May 2024 that it would create an AI safety institute under the Electronics and Telecommunications Research Institute.

EU AI Office. The European Union's AI Office, founded in May 2024 as part of the broader AI Act implementation framework, is a member of the international network. Unlike the national institutes, the EU AI Office has regulatory powers: it can request information from frontier model providers and apply sanctions to enforce the AI Act's requirements. This regulatory authority gives the EU AI Office a different character from the purely advisory and research-oriented national institutes.

The International Network of AI Safety Institutes

At the AI Seoul Summit in May 2024, world leaders from Australia, Canada, the European Union, France, Germany, Italy, Japan, the Republic of Korea, Singapore, the United Kingdom, and the United States signed the Seoul Statement of Intent toward International Cooperation on AI Safety Science, formally launching the International Network of AI Safety Institutes.

The network's stated objectives include accelerating the science of AI safety, promoting complementarity and interoperability among national evaluation methodologies, and fostering a common international understanding of AI safety approaches. The UK AISI has served as a founding anchor of the network, contributing evaluation methodology, the open-source Inspect framework, and bilateral research agreements as key inputs to the collective effort.

The network held its first in-person convening in San Francisco in November 2024, attended by representatives from Australia, Canada, the European Commission, France, Japan, Kenya, South Korea, Singapore, the UK, and the United States. The meeting produced the first joint testing exercise in the network's history, led by technical experts from the US, UK, and Singapore institutes.

In 2025, the AISI spearheaded the establishment of what it named the International Network for Advanced AI Measurement, Evaluation and Science (INAMES), an expanded coordination structure intended to provide more formal infrastructure for joint research and evaluation across member institutes. Australia announced its participation in INAMES alongside its reaffirmation of the Seoul Declaration, and the institute described INAMES as the network's working-level operational arm: a forum in which technical staff from member bodies could co-author evaluations, share unreleased model access where commercially permissible, and converge on common methodology.

International AI Safety Report 2026

On 3 February 2026, the second edition of the International AI Safety Report was published, with the UK AISI providing the secretariat and technical underpinning. The report was led by Yoshua Bengio in his capacity as chair of the report's expert panel, and was authored by more than 100 AI researchers contributing on behalf of over 30 countries and international organizations, making it the largest collaborative scientific assessment of AI safety produced to date.

The 2026 report drew heavily on the institute's own evaluation dataset, including assessments of more than 30 state-of-the-art models conducted since November 2023, and complemented its global synthesis with UK-specific analysis on trajectories in dual-use scientific knowledge, autonomous agent capability, and societal resilience. The report's publication was timed to inform AI governance debates leading up to the next intergovernmental summit, and was described by participating governments as the closest analogue to the IPCC assessment reports that the AI safety field had produced.

Comparison with other AI evaluation bodies

Organization	Country	Type	Regulatory power	Focus
UK AI Security Institute	United Kingdom	Government body	None	Safety and security evaluation, research
US Center for AI Standards and Innovation (CAISI)	United States	Government body	None	Evaluation, standards, measurement
EU AI Office	European Union	Regulatory body	Yes (AI Act)	Compliance, frontier model oversight
Japan AI Safety Institute	Japan	Government body	None	Evaluation, standards
Singapore AISI	Singapore	Academic/government	None	Evaluation, research
Apollo Research	Independent	Non-governmental	None	Alignment research, evaluations
METR	Independent	Non-governmental	None	Autonomous AI evaluation
Frontier Model Forum	Industry	Industry body	None	Industry-led safety standards

The UK institute is distinguished from purely academic or civil society evaluation bodies by its governmental status, which gives it privileged access to security and intelligence information and official standing in international diplomacy. It is distinguished from the EU AI Office by the absence of regulatory powers: the institute cannot compel AI developers to participate in evaluations or to modify their systems in response to findings. It is distinguished from CAISI in the United States by its more operational, startup-like character and its heavier emphasis on producing novel research rather than codifying standards.

Independent evaluation organizations such as Apollo Research and METR operate in adjacent spaces. METR (formerly ARC Evals) specializes in evaluating autonomous AI capabilities and has conducted pre-deployment evaluations of models from Anthropic and others, sometimes in coordination with the AISI. Apollo Research focuses on the question of whether AI systems exhibit deceptive or manipulative behaviors toward humans, a research question that overlaps substantially with the AISI's alignment and control research agenda.

Reception and assessment

The institute has received broadly favorable assessments from AI governance researchers who view independent governmental evaluation capability as a necessary component of responsible AI development at scale. The founding of the AISI was widely credited with shifting the international debate from whether governments should evaluate frontier AI systems to how such evaluations should be conducted.

The voluntary nature of the evaluation agreements has been a persistent subject of scrutiny. Critics have noted that AI developers retain significant control over what they share with the institute, when they share it, and how evaluation findings are communicated publicly. The institute's published evaluation reports do not name specific models, a practice that some observers argue limits the public accountability value of the findings even as it enables the cooperative relationships that make the evaluations possible.

The non-binding character of the institute's guidance has also been noted. Evaluation findings inform but do not legally constrain AI developers. A company that receives an evaluation indicating significant safeguard vulnerabilities is not required to delay release or implement specific mitigations. The institute's influence operates through reputational pressure, government policy channels, and the voluntary commitments developers have made to participate in good faith.

Methodological standardization is another open question. The IAPS analysis of first-wave AI safety institutes noted that evaluation methods are not standardized across the institute's own evaluations or across the international network, making it difficult to compare findings over time or across organizations. The Inspect framework represents a partial response to this concern. By open-sourcing its evaluation infrastructure, the institute has made it easier for others to use similar methodologies, but does not resolve the deeper question of what constitutes a definitive assessment that a given model is or is not safe.

The February 2025 renaming raised concerns among some researchers that the AISI's narrowed focus on security-oriented risks would leave other important categories of AI harm, including algorithmic bias, labor displacement, democratic manipulation, and risks to freedom of expression, without a dedicated governmental evaluation body in the UK.

The institute's 2025 year-in-review highlighted the scale of its research activities: 30-plus model evaluations, a large-scale persuasion study with more than 76,000 participants, a major agent red-teaming challenge identifying 62,000 vulnerabilities, and the publication of the Frontier AI Trends Report drawing on the institute's full two-year evaluation dataset. These activities have established the AISI as the most productive government AI evaluation body by publication volume and operational scale. The 2026 publication slate (the long-form-task fraud framework, the Claude Mythos cybersecurity assessment, and the institute's contribution to the International AI Safety Report 2026) extended that record into a third year of sustained output.

Funding and resources

The institute operates with annual funding of approximately £66 million from the Department for Science, Innovation and Technology. This represents a substantial increase from the £100 million initial lump-sum commitment made for the Frontier AI Taskforce in April 2023, which covered a broader initial build-out period.

Beyond direct budget allocation, the institute has access to more than £1.5 billion in high-performance computing resources through the national AI Research Resource, including priority access to the Isambard-AI supercomputer. This compute access enables the institute to conduct its own model training and evaluation experiments at scales that would otherwise be cost-prohibitive for a government research body.

The institute has distributed more than £15 million in external research grants through the Alignment Project and an additional £8 million through Systemic Safety Grants and a £5 million Challenge Fund. These grant programs are designed to fund alignment and safety research at universities and independent research organizations that the institute considers strategically important but that fall outside its own core operational priorities.

Timeline of key milestones

Date	Event
April 2023	UK government announces Frontier AI Taskforce with £100 million initial funding
1-2 November 2023	AI Safety Summit at Bletchley Park; Bletchley Declaration signed
2 November 2023	UK AI Safety Institute formally launched as successor to the Taskforce
May 2024	First advanced evaluations report; Inspect framework released open-source
May 2024	AI Seoul Summit; International Network of AI Safety Institutes launched
November 2024	First in-person convening of the international network in San Francisco
14 February 2025	Renamed UK AI Security Institute; criminal misuse team established
March-April 2025	Inspect Agent Red-Teaming Challenge with Gray Swan AI
2025	Inaugural Frontier AI Trends Report; INAMES network announced
October 2025	Adam Beaumont appointed Interim Director
February 2026	Fraud and cybercrime evaluation framework published
3 February 2026	International AI Safety Report 2026 published under AISI secretariat
2026	Claude Mythos Preview cyber capability evaluation released
May 2026	Microsoft signs joint evaluation agreement with UK AISI and US CAISI

References

UK Government. "Prime Minister launches new AI Safety Institute." GOV.UK. 2 November 2023. https://www.gov.uk/government/news/prime-minister-launches-new-ai-safety-institute
UK Government. "Introducing the AI Safety Institute." GOV.UK. November 2023. https://www.gov.uk/government/publications/ai-safety-institute-overview/introducing-the-ai-safety-institute
UK Government. "The Bletchley Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023." GOV.UK. https://www.gov.uk/government/publications/ai-safety-summit-2023-the-bletchley-declaration/the-bletchley-declaration-by-countries-attending-the-ai-safety-summit-1-2-november-2023
UK Government. "Tackling AI security risks to unleash growth and deliver Plan for Change." GOV.UK. 14 February 2025. https://www.gov.uk/government/news/tackling-ai-security-risks-to-unleash-growth-and-deliver-plan-for-change
Peter Kyle. "Remarks at the Munich Security Conference." GOV.UK. 14 February 2025. https://www.gov.uk/government/speeches/remarks-made-by-technology-secretary-peter-kyle-at-the-munich-security-conference
UK AI Security Institute. "About." aisi.gov.uk. https://www.aisi.gov.uk/about
UK AI Security Institute. "Research Agenda." aisi.gov.uk. https://www.aisi.gov.uk/research-agenda
UK AI Security Institute. "Our 2025 Year in Review." aisi.gov.uk. https://www.aisi.gov.uk/blog/our-2025-year-in-review
UK AI Security Institute. "Frontier AI Trends Report." aisi.gov.uk. https://www.aisi.gov.uk/frontier-ai-trends-report
UK AI Security Institute. "Our evaluation of Claude Mythos Preview's cyber capabilities." aisi.gov.uk. https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities
UK AI Security Institute. "Advanced AI evaluations at AISI: May update." aisi.gov.uk. https://www.aisi.gov.uk/blog/advanced-ai-evaluations-may-update
UK AI Security Institute. "Inspect." inspect.aisi.org.uk. https://inspect.aisi.org.uk/
UK AI Security Institute. "Introducing ControlArena." aisi.gov.uk. https://www.aisi.gov.uk/blog/introducing-controlarena-a-library-for-running-ai-control-experiments
UK AI Security Institute. "An evaluation framework for AI misuse in fraud and cybercrime." aisi.gov.uk. February 2026. https://www.aisi.gov.uk/blog/an-evaluation-framework-for-ai-misuse-in-fraud-and-cybercrime
UK AI Security Institute. "How will AI enable the crimes of the future?" aisi.gov.uk. https://www.aisi.gov.uk/blog/how-will-ai-enable-the-crimes-of-the-future
Google DeepMind. "Deepening AI Safety Research with UK AI Security Institute." deepmind.google. https://deepmind.google/blog/deepening-our-partnership-with-the-uk-ai-security-institute/
Microsoft. "Advancing AI evaluation with the Center for AI Standards and Innovation (US) and the AI Security Institute (UK)." blogs.microsoft.com. 5 May 2026. https://blogs.microsoft.com/on-the-issues/2026/05/05/advancing-ai-evaluation-with-the-center-for-ai-standards-us-and-innovation-and-the-ai-security-institute-uk/
NIST. "Center for AI Standards and Innovation (CAISI)." nist.gov. https://www.nist.gov/caisi
UK Government. "Seoul Statement of Intent toward International Cooperation on AI Safety Science." GOV.UK. 21 May 2024. https://www.gov.uk/government/publications/seoul-declaration-for-safe-innovative-and-inclusive-ai-ai-seoul-summit-2024/seoul-statement-of-intent-toward-international-cooperation-on-ai-safety-science-ai-seoul-summit-2024-annex
UK Government. "Global leaders agree to launch first international network of AI Safety Institutes." GOV.UK. 21 May 2024. https://www.gov.uk/government/news/global-leaders-agree-to-launch-first-international-network-of-ai-safety-institutes-to-boost-understanding-of-ai
NIST. "International Network of AI Safety Institutes Mission Statement." nist.gov. https://www.nist.gov/document/international-network-ai-safety-institutes-mission-statement
International AI Safety Report. "International AI Safety Report 2026." internationalaisafetyreport.org. 3 February 2026. https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026
AI Now Institute. "AI Now Statement on the UK AI Safety Institute transition to the UK AI Security Institute." ainowinstitute.org. 2025. https://ainowinstitute.org/news/ai-now-statement-on-the-uk-ai-safety-institute-transition-to-the-uk-ai-security-institute
Gray Swan AI. "UK AISI x Gray Swan Agent Red-Teaming Challenge: Results Snapshot." grayswan.ai. 2025. https://www.grayswan.ai/news/uk-aisi-x-gray-swan-agent-red-teaming-challenge-results-snapshot
TechCrunch. "At Bletchley, Rishi Sunak confirms AI Safety Institute but delays regulations for another day." 2 November 2023. https://techcrunch.com/2023/11/02/at-bletchley-rishi-sunak-confirms-ai-safety-institute-but-delays-regulations-for-another-day/
TechCrunch. "UK drops 'safety' from its AI body, now called AI Security Institute, inks MOU with Anthropic." 13 February 2025. https://techcrunch.com/2025/02/13/uk-drops-safety-from-its-ai-body-now-called-ai-security-institute-inks-mou-with-anthropic/
Institute for AI Policy and Strategy. "Understanding the First Wave of AI Safety Institutes: Characteristics, Functions, and Challenges." iaps.ai. https://www.iaps.ai/research/understanding-aisis
Time. "Inside the U.K.'s Bold Experiment in AI Safety." time.com. https://time.com/7204670/uk-ai-safety-institute/
Computer Weekly. "Government renames AI Safety Institute and teams up with Anthropic." computerweekly.com. 2025. https://www.computerweekly.com/news/366619238/Government-renames-AI-Safety-Institute-and-teams-up-with-Anthropic
Transformer News. "UK AISI hires ex-GCHQ AI chief Adam Beaumont as interim director." transformernews.ai. 2025. https://www.transformernews.ai/p/exclusive-uk-aisi-hires-gchq-chief-adam-beaumont
UK Parliament. "AI Security Institute - Written statement HLWS454." questions-statements.parliament.uk. 24 February 2025. https://questions-statements.parliament.uk/written-statements/detail/2025-02-24/hlws454
Wikipedia. "Artificial intelligence safety institute." en.wikipedia.org. https://en.wikipedia.org/wiki/AI_Safety_Institute

Background: the Frontier AI Taskforce

The AI Safety Summit and AISI launch

Mission and core functions

Leadership

Pre-deployment evaluations

Methodology

Key findings

Frontier AI Trends Report

Claude Mythos evaluation (2026)

Fraud and cybercrime evaluation framework (February 2026)

The Inspect evaluation framework

ControlArena

Research agenda

Risk research

Solutions research

Renamed UK AI Security Institute (February 2025)

Reactions to the renaming

Government and industry agreements

The Inspect Agent Red-Teaming Challenge

International counterparts and the global network

The International Network of AI Safety Institutes

International AI Safety Report 2026

Comparison with other AI evaluation bodies

Reception and assessment

Funding and resources

Timeline of key milestones

See also

References

Improve this article

Related Articles

Situational Awareness

Sovereign AI

GAIA benchmark

TikTok

Patronus AI

Frontier models

Background: the Frontier AI Taskforce

The AI Safety Summit and AISI launch

Mission and core functions

Leadership

Pre-deployment evaluations

Methodology

Key findings

Frontier AI Trends Report

Claude Mythos evaluation (2026)

Fraud and cybercrime evaluation framework (February 2026)

The Inspect evaluation framework

ControlArena

Research agenda

Risk research

Solutions research

Renamed UK AI Security Institute (February 2025)

Reactions to the renaming

Government and industry agreements

The Inspect Agent Red-Teaming Challenge

International counterparts and the global network

The International Network of AI Safety Institutes

International AI Safety Report 2026

Comparison with other AI evaluation bodies

Reception and assessment

Funding and resources

Timeline of key milestones

See also

References

Related Articles

Situational Awareness

Sovereign AI

GAIA benchmark

TikTok

Patronus AI

Frontier models