Foundation models

From AI Wiki
(Redirected from Foundation model)
Short description: Paradigm of artificial intelligence models pretrained on broad data and adaptable to many tasks

Template:About Template:Infobox software

Foundation models (FMs) are large machine learning models trained on broad data, generally via self-supervised learning at scale, that can then be adapted (for example via fine-tuning) to a wide range of downstream tasks across modalities.[1][2] The term emphasizes that such models serve as a common foundation upon which many downstream tasks and applications can be built, rather than bespoke systems trained separately for each task. Many prominent large language models (LLMs) and multimodal systems are commonly described as foundation models.[1]

Building foundation models is often highly resource-intensive, with the most advanced models costing hundreds of millions of dollars to cover the expenses of acquiring, curating, and processing massive datasets, as well as the compute power required for training.[3] In contrast, adapting existing foundation models for specific tasks or using them directly is far less costly, as it leverages pre-trained capabilities and typically requires only fine-tuning on smaller, task-specific datasets.[3]

Definition and scope

The phrase foundation model was popularized by researchers associated with Stanford's Center for Research on Foundation Models (CRFM) in August 2021 to describe models trained on broad data at scale using mostly self-supervision and adaptable to many tasks.[1] The researchers chose "foundation" over "foundational" because "foundational" implies that these models provide fundamental principles in a way that "foundation" does not.[3] They also noted that preexisting terms were inadequate: "'(large) language model' was too narrow given [the] focus is not only language; 'self-supervised model' was too specific to the training objective; and 'pretrained model' suggested that the noteworthy action all happened after 'pretraining.'"[1]

Foundation models differ from large language models (LLMs), as LLMs are a subset specifically focused on interpreting, generating, and manipulating human language, while foundation models encompass broader modalities like text, images, video, or other data types.[4] All LLMs can be considered foundation models, but not all foundation models are LLMs.[5]

Legal definitions

Subsequent policy and standards documents have adopted related terminology. In the United States, Executive Order 14110 defines a subcategory of "dual-use foundation model" as an artificial intelligence model trained on broad data, generally using self-supervision, containing at least tens of billions of parameters, applicable across many contexts, and exhibiting, or easily modifiable to exhibit, high levels of performance at tasks that pose serious risks to security, national economic security, or public health or safety.[6]

NIST documents and glossaries similarly characterize foundation models as broadly trained, self-supervised systems adaptable to varied tasks.[2][7]

In the European Union's AI Act, closely related terminology, general-purpose AI models (GPAI), is used for models intended for integration into many downstream systems, with specific obligations (including additional duties for models with systemic risk).[8]

History

Technologically, foundation models are built using established machine learning techniques like deep neural networks, transfer learning, and self-supervised learning.[3] The concept of pre-training a large model on a general dataset and then fine-tuning it for specific tasks has roots in earlier work on transfer learning with models like Word2vec and GloVe. However, the paradigm shifted significantly with the introduction of the Transformer architecture in 2017.[9]

Subsequent models like Google's BERT (Bidirectional Encoder Representations from Transformers) in 2018 and OpenAI's GPT series demonstrated the power of large-scale, pre-trained language models.[10] As these models grew in size and capability, their potential applications expanded far beyond their initial scope.

The 2022 releases of Stable Diffusion and ChatGPT (initially powered by the GPT-3.5 model) led to foundation models and generative AI entering widespread public discourse.[3] Further releases of LLaMA, Llama 2, and Mistral in 2023 contributed to a greater emphasis placed on how foundation models are released, with open foundation models garnering significant support and scrutiny.[3]

Characteristics

Foundation models are distinguished by several defining characteristics:

  • Broad pretraining data: FMs are trained on diverse, large-scale corpora (text, images, audio, code, etc.), typically using self-supervised learning objectives such as next-token prediction or masked modeling.[1][11]
  • Adaptability: After pretraining, they can be adapted via fine-tuning, instruction tuning, or reinforcement learning from human feedback (RLHF) to perform specific tasks or follow instructions more reliably.[12][13]
  • Scale: Foundation models operate at massive scale in three dimensions, data scale (billions to trillions of tokens), model scale (tens of billions to trillions of parameters), and compute scale (thousands of GPUs for training).[14]
  • Emergence: Quantitative increases in scale lead to new qualitative capabilities that were not present in smaller versions, such as zero-shot learning and few-shot learning.[15]
  • Homogenization: A wide array of applications are built on a small number of foundation models, creating both efficiency gains and systemic risks.[1]

Architecture and Training

Most modern foundation models are based on the Transformer architecture, though diffusion models are widely used for image, audio, and video generation.[16] Contrastive learning across image–text pairs (for example CLIP) is a common multimodal pretraining strategy.[17]

Pretraining

Pretraining typically uses self-supervised learning on broad corpora to learn general representations. For text, common objectives include next-token autoregressive modeling and masked language modeling; for images and audio, objectives include masked/denoising prediction and diffusion-based reconstruction.[1][9][16]

Adaptation methods

Notable Examples

Foundation Models by Modality and Developer
Model Modality Parameters Year Developer Key Features Source
BERT Text (LLM) 340M 2018 Google First bidirectional foundation model [10]
GPT-3 Text (LLM) 175B 2020 OpenAI Demonstrated strong few-shot learning [18]
GPT-4 Multimodal >1T (estimated) 2023 OpenAI Advanced multimodal capabilities [20]
Claude series Text (LLM) Various 2023-2024 Anthropic Advanced reasoning and safety features [21]
Gemini Multimodal Various 2023-2024 Google State-of-the-art multimodal model [22]
Llama 2 Text (LLM) 7B-70B 2023 Meta AI Open-weight foundation model [23]
BLOOM Text (LLM) 176B 2022 BigScience Multilingual, supports 46 languages [24]
CLIP Vision-Text 400M 2021 OpenAI Contrastive text-image pretraining [17]
DALL-E 3 Text-to-Image Unknown 2023 OpenAI High-quality text-to-image generation [25]
Stable Diffusion Text-to-Image 890M 2022 Stability AI Open-source image generation [26]
FlamingoFlamingo Multimodal 80B 2022 DeepMind Visual language model for few-shot learning [27]
AlphaFold 2 Protein structure 21M 2021 DeepMind Protein structure prediction [28]

Applications

Foundation models enable a broad range of applications, often after modest adaptation:

Natural Language Processing

  • Question answering, summarization, translation, code generation, and dialogue assistants based on large language models.[18]
  • Sentiment analysis and text classification
  • Content generation for articles, marketing copy, emails, and creative writing

Computer Vision

Scientific Research

  • Drug discovery and molecular design[28]
  • Climate modeling and prediction
  • Materials science and chemistry applications
  • Genomics and radiology applications[29]

Code and Software Development

  • Code completion and generation (GitHub Copilot, Amazon CodeWhisperer)
  • Bug detection and fixing
  • Code explanation and documentation
  • Language translation between programming languages

Business and Industry

  • Customer service chatbots and virtual assistants
  • Supply chain optimization and predictive maintenance
  • Fraud detection and risk assessment
  • Automated investment research and financial analysis[30]

Robotics

  • Visual navigation and task planning
  • Human-robot interaction
  • Manipulation tasks
  • Environment simulation using world models[31]

Governance, Safety, and Regulation

Policy makers and standards bodies have proposed governance approaches tailored to foundation models:

  • United States: The 2023 Executive Order introduces reporting and safety requirements tied to high-risk "dual-use foundation models," defining them by training scale and potential risk profile.[6][32]
  • NIST: Guidance on managing misuse risk for dual-use FMs provides terminology and recommended practices for developers and deployers.[7]
  • European Union: The AI Act establishes obligations for general-purpose AI models (GPAI), including heightened duties for models with systemic risk and a Code of Practice pathway.[8]

Frontier Models

Certain highly advanced foundation models are termed "frontier models," which have the potential to "possess dangerous capabilities sufficient to pose severe risks to public safety."[3] These capabilities may include:

  • Designing and synthesizing new biological or chemical weapons
  • Producing and propagating convincing, tailored disinformation
  • Harnessing unprecedented offensive cyber capabilities
  • Evading human control through deceptive means

Transparency and Documentation

Research initiatives evaluate the transparency of FM developers across data, compute, model characteristics, and downstream impact:

  • Foundation Model Transparency Index (FMTI): An index specifying 100 indicators to measure transparency, with reports in 2023 (v1.0) and 2024 (v1.1). The 2023 paper documented low average transparency; the 2024 update reported improvements across 14 developers while noting persistent gaps.[33][34]

Challenges and Limitations

Computational Requirements

  • Training costs can reach hundreds of millions of dollars[3]
  • Require thousands of GPUs for training
  • Environmental impact from energy consumption
  • Computational power required has doubled every 3.4 months since 2012[14]

Bias and Fairness

  • Models can perpetuate and amplify biases present in training data[1]
  • Social and demographic biases
  • Geographic and cultural biases
  • Language biases favoring well-represented languages
  • Risk of homogenization spreading biases across all downstream applications

Data Quality and Privacy

  • Web-scraped data often contains toxic, biased, or copyrighted material[3]
  • Privacy concerns when training on user data
  • Challenges in data curation at scale
  • Risk of memorizing and reproducing training data

Evaluation Challenges

  • Difficulty in comprehensively evaluating general-purpose models
  • Emergent capabilities that are hard to predict
  • Need for new benchmarks and evaluation frameworks

Supply Chain

The supply chain for foundation models involves upstream resources (data from providers like Scale AI, Surge AI; compute from AWS, Google Cloud, Microsoft Azure) and downstream adaptations.[3] Costs concentrate compute (80% of 2023 AI capital), leading to market consolidation among few companies.[3]

Release Strategies

Release strategies for foundation models include:

  • API access: Users query the model through an interface (for example OpenAI's GPT-4)
  • Open weights: Model weights available for download (for example Meta's Llama)
  • Closed/Limited access: Restricted to specific users or organizations

See also

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Stanford Center for Research on Foundation Models (CRFM), On the Opportunities and Risks of Foundation Models (2021). https://crfm.stanford.edu/report.html ; arXiv: https://arxiv.org/abs/2108.07258
  2. 2.0 2.1 NIST CSRC Glossary, foundation model. https://csrc.nist.gov/glossary/term/foundation_model
  3. 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.10 Foundation model - Wikipedia. https://en.wikipedia.org/wiki/Foundation_model
  4. Foundation Model vs LLM: Key Differences Explained - Openxcell. https://www.openxcell.com/blog/foundation-model-vs-llm/
  5. Foundation Models Vs LLM(Large Language Models) - LinkedIn. https://www.linkedin.com/pulse/foundation-models-vs-llmlarge-language-aman-walia-aslsc
  6. 6.0 6.1 Executive Order 14110 (Oct. 30, 2023), Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Federal Register: https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence ; White House archive: https://bidenwhitehouse.archives.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/
  7. 7.0 7.1 NIST, Managing Misuse Risk for Dual-Use Foundation Models (AI 800-1 initial/draft publications, 2024–2025). https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-1.ipd.pdf ; https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-1.ipd2.pdf
  8. 8.0 8.1 European Commission, General-Purpose AI Models in the AI Act – Questions & Answers (2025). https://digital-strategy.ec.europa.eu/en/faqs/general-purpose-ai-models-ai-act-questions-answers
  9. 9.0 9.1 Vaswani et al., Attention Is All You Need (2017). arXiv: https://arxiv.org/abs/1706.03762 ; NeurIPS PDF: https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf
  10. 10.0 10.1 Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018). https://arxiv.org/abs/1810.04805
  11. IBM, What is self-supervised learning?. https://www.ibm.com/think/topics/self-supervised-learning
  12. 12.0 12.1 Jason Wei et al., Finetuned Language Models Are Zero-Shot Learners (arXiv 2021; ICLR 2022). https://arxiv.org/abs/2109.01652 ; https://openreview.net/pdf?id=gEZrGCozdqR
  13. 13.0 13.1 Long Ouyang et al., Training language models to follow instructions with human feedback (NeurIPS 2022). https://arxiv.org/abs/2203.02155 ; https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
  14. 14.0 14.1 What are Foundation Models? - Foundation Models in Generative AI Explained - AWS. https://aws.amazon.com/what-is/foundation-models/
  15. Wei et al., Emergent Abilities of Large Language Models (2022). https://arxiv.org/abs/2206.07682
  16. 16.0 16.1 16.2 Ho, Jain, Abbeel, Denoising Diffusion Probabilistic Models (2020). arXiv: https://arxiv.org/abs/2006.11239 ; NeurIPS PDF: https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf
  17. 17.0 17.1 17.2 Radford et al., Learning Transferable Visual Models From Natural Language Supervision (2021). arXiv: https://arxiv.org/abs/2103.00020 ; ICML PDF: https://proceedings.mlr.press/v139/radford21a/radford21a.pdf
  18. 18.0 18.1 18.2 Tom B. Brown et al., Language Models are Few-Shot Learners (GPT-3; 2020). arXiv: https://arxiv.org/abs/2005.14165 ; NeurIPS PDF: https://papers.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
  19. Patrick Lewis et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020). arXiv: https://arxiv.org/abs/2005.11401 ; NeurIPS PDF: https://proceedings.neurips.cc/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf
  20. OpenAI, GPT-4 Technical Report (2023). https://arxiv.org/abs/2303.08774
  21. Anthropic, Claude 3 Model Card (2024). https://www.anthropic.com/claude
  22. Google, Gemini: A Family of Highly Capable Multimodal Models (2023). https://arxiv.org/abs/2312.11805
  23. Hugo Touvron et al., Llama 2: Open Foundation and Fine-Tuned Chat Models (2023). arXiv: https://arxiv.org/abs/2307.09288 ; Meta page: https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/
  24. BigScience Workshop, BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022). https://arxiv.org/abs/2211.05100
  25. OpenAI, DALL-E 3 System Card (2023). https://cdn.openai.com/papers/DALL_E_3_System_Card.pdf
  26. Rombach et al., High-Resolution Image Synthesis with Latent Diffusion Models (2022). https://arxiv.org/abs/2112.10752
  27. Alayrac et al., Flamingo: a Visual Language Model for Few-Shot Learning (2022). https://arxiv.org/abs/2204.14198
  28. 28.0 28.1 Jumper et al., Highly accurate protein structure prediction with AlphaFold (Nature, 2021). https://doi.org/10.1038/s41586-021-03819-2
  29. Healthcare applications in Stanford FM report. https://crfm.stanford.edu/report.html#healthcare
  30. Foundation Models And LLMs: 19 Real-World, Practical Use Cases - Forbes (2025). https://www.forbes.com/councils/forbestechcouncil/2025/02/05/foundation-models-and-llms-19-real-world-practical-use-cases/
  31. NVIDIA Cosmos World Foundation Models (2025). https://blogs.nvidia.com/blog/world-foundation-models/
  32. U.S. Congressional Research Service, The AI Executive Order and Its Potential Implications for DOD (Dec. 12, 2023). https://www.congress.gov/crs-product/IN12286
  33. Rishi Bommasani et al., The Foundation Model Transparency Index (Oct. 2023). https://arxiv.org/abs/2310.12941 ; Stanford HAI explainer: https://hai.stanford.edu/news/introducing-foundation-model-transparency-index
  34. Rishi Bommasani et al., The Foundation Model Transparency Index v1.1: May 2024 (paper & CRFM update). https://crfm.stanford.edu/fmti/paper.pdf ; https://crfm.stanford.edu/2024/05/21/fmti-may-2024.html

External Links