Template:Infobox AI concept
Frontier models are the most advanced artificial intelligence models that represent the cutting edge of AI capabilities at any given time. These highly capable foundation models push the boundaries of what is possible with AI technology, typically characterized by their massive scale, multimodal capabilities, and ability to perform a wide variety of complex tasks across different domains.[1][2] The term encompasses both foundation models and general-purpose AI systems that exceed the capabilities of existing models and often require significant computational resources to develop and deploy, typically exceeding 10^25 floating-point operations (FLOPs) during training.[3][4]
The concept of frontier models was formally introduced in a 2023 paper by OpenAI titled "Frontier AI Regulation: Managing Emerging Risks to Public Safety," where they are defined as "highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety."[5] This definition highlights three key challenges: dangerous capabilities can emerge unexpectedly during development or post-deployment; it is difficult to prevent misuse of deployed models; and model capabilities can proliferate rapidly, especially through open-source releases.
The United Kingdom government, a key proponent of the term, defines frontier AI as "highly capable general-purpose AI models that can perform a wide variety of tasks and match or exceed the capabilities present in today's most advanced models."[6] The UK's analysis emphasizes that frontier models are increasingly multimodal and may be augmented into autonomous AI agents with tool use (for example web browsing, code execution) that expand their real-world impact.[7]
The Frontier Model Forum, an industry body established by leading AI companies, defines frontier models as "large-scale machine-learning models that exceed the capabilities currently present in the most advanced existing models, and can perform a wide variety of tasks."[8]
Frontier models typically exhibit several defining characteristics:
| Characteristic | Description | Implication |
|---|---|---|
| Massive Scale and Compute | Trained using vast computational resources, often exceeding 10^25 FLOPs, with parameter counts in the hundreds of billions or trillions, and training costs ranging from tens to hundreds of millions of dollars | High development costs create barriers to entry, concentrating power in well-funded labs. Scale is a primary driver of advanced capabilities |
| Multimodal Capabilities | Ability to process and generate multiple types of data including text, images, audio, and video | Enables broad applicability across domains but complicates safety evaluation and control |
| Emergent Properties | Display abilities that were not explicitly programmed, such as complex reasoning, chain-of-thought reasoning, code generation, or creative writing | Makes pre-deployment risk assessment challenging. The full capabilities may not be understood until widely deployed[9] |
| General-Purpose Functionality | Designed to be adaptable across a wide range of domains and tasks with minimal fine-tuning | Enormous utility but difficult to predict all possible uses, including malicious applications |
| Extended Context Windows | Modern frontier models can process extensive amounts of information, with some models supporting context windows of over 2 million tokens | Enables complex, long-form tasks but increases potential for sophisticated manipulation[10] |
| Autonomy Potential | Advanced models show increasing ability to act autonomously to achieve goals, using tools, accessing external information, and self-correcting | Raises long-term safety concerns about control and alignment with human values |
The history of frontier models is intertwined with the evolution of foundation models, which emerged in the late 2010s. The term "foundation model" was coined in 2021 by researchers at the Stanford Institute for Human-Centered Artificial Intelligence (HAI) to describe large-scale models trained on broad data using self-supervision.[11]
The first models trained on over 10^23 FLOPs were AlphaGo Master and AlphaGo Zero, developed by DeepMind and published in 2017.[12] The progression to current frontier models included:
| Date | Milestone | Significance |
|---|---|---|
| 2017 | AlphaGo Master/Zero | First models exceeding 10^23 FLOPs |
| 2018 | BERT (Google) | Demonstrated power of transformer-based language models |
| 2019 | GPT-2 (OpenAI) | Raised concerns about misuse, leading to staged release |
| 2020 | GPT-3 (OpenAI) | 175 billion parameters, sparked widespread interest in large language models |
| March 2023 | GPT-4 (OpenAI) | First model exceeding 10^25 FLOPs, demonstrated significant capability leap[13] |
| May 2023 | Joint AI Safety Statement | CEOs of major AI labs declare extinction risk from AI a global priority[14] |
| July 2023 | Frontier Model Forum launched | Industry coordination body established by Anthropic, Google, Microsoft, and OpenAI[15] |
| October 2023 | U.S. Executive Order 14110 | Established compute-based reporting requirements for frontier models[16] |
| November 2023 | Bletchley Declaration | 28 countries agree on frontier AI risks and cooperation[17] |
| 2024 | EU AI Act finalized | Establishes regulations for "general-purpose AI models with systemic risk"[18] |
| 2024-2025 | Proliferation of frontier models | Over 30 models exceed 10^25 FLOP threshold[4] |
As of 2025, several models are considered to be at the frontier of AI capabilities:
| Model Name | Developer | Release Date | Key Capabilities | Training Compute (Estimated) | Parameter Count |
|---|---|---|---|---|---|
| GPT-4 | OpenAI | March 2023 | Multimodal processing, advanced reasoning, code generation | ~2×10^25 FLOPs | Undisclosed (est. 1.76 trillion) |
| GPT-4o | OpenAI | May 2024 | Optimized multimodal, real-time voice and vision capabilities | ~10^26 FLOPs | Undisclosed |
| Claude 3 Opus | Anthropic | March 2024 | Extended context (200K tokens), strong reasoning, constitutional AI approach | >10^25 FLOPs | Undisclosed |
| Claude 3.5 Sonnet | Anthropic | June 2024 | Complex reasoning, analysis, creative tasks, computer use capabilities | >10^25 FLOPs | Undisclosed |
| Gemini Ultra | Google DeepMind | December 2023 | Natively multimodal, strong benchmark performance | >10^25 FLOPs | Undisclosed |
| Gemini 1.5 Pro | Google DeepMind | February 2024 | Massive context window (up to 2 million tokens in preview) | >10^25 FLOPs | Undisclosed |
| Llama 3.1 405B | Meta | July 2024 | Open-source, multilingual, long-context reasoning | >10^25 FLOPs | 405 billion |
| Grok-3 | xAI | 2025 | Real-time knowledge integration, truth-seeking emphasis | Undisclosed | Undisclosed |
| DeepSeek-V2 | DeepSeek | 2024 | Cost-efficient, strong in coding and math | >10^25 FLOPs | 236 billion |
Leading frontier models achieve near or above human-level performance on various standardized benchmarks:
| Benchmark | Description | Frontier Model Performance | Human Performance |
|---|---|---|---|
| MMLU | Multitask Language Understanding | 86-90% | ~89% |
| GSM8K | Grade School Math | 90-95% | ~90% |
| HumanEval | Code Generation | 85-90% pass rate | Variable (expert: ~95%) |
| GPQA | Graduate-level Science Questions | 50-60% | PhD experts: ~65% |
| BigBench-Hard | Challenging reasoning tasks | 80-85% | ~85% |
| Domain | Capabilities | Example Applications | Key Models |
|---|---|---|---|
| Natural Language Processing | Text understanding, generation, translation, summarization | Chatbots, content creation, document analysis, code generation | All frontier models |
| Computer Vision | Image/video analysis, generation, manipulation, understanding | Medical imaging, autonomous vehicles, creative tools, quality inspection | GPT-4V, Gemini, Claude 3 |
| Scientific Research | Data analysis, hypothesis generation, pattern recognition, simulation | Drug discovery, climate modeling, genomics research, materials science | GPT-4, Claude 3 Opus |
| Code Generation | Writing, debugging, explaining, and optimizing complex code | Software development, automation, technical documentation, DevOps | GPT-4, Claude 3.5 Sonnet |
| Multimodal Processing | Integration of text, image, audio, video inputs/outputs | Virtual assistants, content moderation, accessibility tools | GPT-4o, Gemini 1.5 |
| Autonomous Agents | Tool use, web browsing, API calls, multi-step reasoning | Research assistants, customer service, workflow automation | Claude 3.5 (computer use), GPT-4 (plugins) |
The European Union AI Act establishes specific regulations for what it terms "general-purpose AI models with systemic risk," which closely aligns with the concept of frontier models.[18]
| Aspect | Requirement | Details |
|---|---|---|
| Compute Threshold | 10^25 FLOPs | Models exceeding this threshold are presumed to have systemic risk[19] |
| Model Evaluations | Mandatory testing | Standardized protocols and adversarial testing required |
| Risk Assessment | Systemic risk evaluation | Must assess and mitigate potential societal-scale risks |
| Incident Reporting | Report to AI Office | Serious incidents must be reported to EU AI Office |
| Cybersecurity | Adequate protections | Ensure model and weight protection against misuse |
| Documentation | Technical documentation | Comprehensive documentation for downstream providers[20] |
| Compliance Deadline | August 2025 | Full implementation required for covered models |
The U.S. approach centers on Executive Order 14110 on Safe, Secure, and Trustworthy AI:
| Component | Threshold | Requirements |
|---|---|---|
| Dual-use Foundation Models | >10^26 FLOPs | Report to government; share safety test results |
| Biological Sequence Models | >10^23 FLOPs (if primarily biological data) | Enhanced scrutiny for biosecurity risks |
| Red Team Testing | All covered models | Required before deployment |
| NIST AI RMF | Voluntary framework | Risk management guidance for AI lifecycle[21] |
| AI Safety Institute | Established 2023 | Develops standards and evaluation frameworks[22] |
The UK focuses on capability-based assessment rather than rigid compute thresholds:
California has taken a leading role in U.S. state-level frontier AI governance:
The OECD updated its AI Principles in 2024 to address frontier models, emphasizing:
Established in July 2023 by Anthropic, Google, Microsoft, and OpenAI, the Frontier Model Forum serves as the primary industry coordination body.[8]
| Aspect | Details |
|---|---|
| Current Members (2025) | Amazon, Anthropic, Google, Meta, Microsoft, OpenAI[27] |
| Key Objectives | AI safety research, best practices, policy collaboration, societal applications |
| AI Safety Fund | $10 million for independent safety research[28] |
| Executive Director | Chris Meserole (appointed October 2023) |
| Focus Areas | CBRN risks, cyber capabilities, societal impacts, evaluation standards |
Launched at the AI Seoul Summit in May 2024, creating global collaboration:[29]
| Member Country | Institute Status | Key Focus |
|---|---|---|
| United States (Chair) | NIST USAISI | Standards and evaluation frameworks |
| United Kingdom | AI Security Institute | Pre-deployment testing and evaluation |
| European Union | AI Office | Regulatory compliance and enforcement |
| Japan | AI Safety Institute | International standards alignment |
| Singapore | AI Verify Foundation | Testing and certification |
| Others | Various stages | Canada, France, Kenya, Australia, South Korea |
Signed by 28 countries in November 2023, establishing international consensus on:
| Risk Category | Description | Examples | Mitigation Approaches |
|---|---|---|---|
| Emergent Capabilities | Unexpected abilities appearing at scale | In-context learning, tool use, deception | Comprehensive evaluation, capability discovery research[9] |
| Hallucination | Generation of false/misleading information | Fabricated citations, incorrect facts | Improved training, retrieval augmentation, uncertainty quantification |
| Jailbreaking | Bypassing safety constraints | Harmful content generation, misuse instructions | Adversarial training, constitutional AI, robust safety layers |
| Loss of Control | Models acting beyond intended parameters | Reward hacking, mesa-optimization | Alignment research, interpretability, killswitches |
| Risk | Impact | Affected Groups |
|---|---|---|
| Labor Displacement | Automation of cognitive work | Knowledge workers, creative professionals |
| Economic Concentration | Market dominance by few companies | Smaller firms, developing nations |
| Bias Amplification | Perpetuation of historical prejudices | Marginalized communities |
| Democratic Erosion | Manipulation of public discourse | Citizens, democratic institutions |
| Environmental Impact | Massive energy consumption | Global climate, local communities[31] |
| Aspect | Current Scale (2025) | Projected (2027) |
|---|---|---|
| Training Compute | 10^25 - 10^26 FLOPs | 10^27 - 10^28 FLOPs |
| Training Cost | $100M - $500M | $1B - $5B |
| Training Duration | 3-6 months | 6-12 months |
| GPU Requirements | 10,000 - 50,000 GPUs | 100,000+ GPUs |
| Energy Consumption | 50-200 GWh | 500+ GWh |
According to the International Scientific Report on Advanced AI Safety:[33]