LLaMA/Model Card
Last reviewed
May 10, 2026
Sources
17 citations
Review status
Source-backed
Revision
v2 ยท 2,493 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 10, 2026
Sources
17 citations
Review status
Source-backed
Revision
v2 ยท 2,493 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: LLaMA, Llama 2, Llama 3, Llama 4, Model card
The Llama model card is the official documentation that Meta AI ships with each release of the Llama family of large language models. The first card appeared in February 2023 inside facebookresearch/llama as MODEL_CARD.md, and every subsequent generation (Llama 2 in July 2023, Llama 3 in April 2024, Llama 3.1, 3.2, and 3.3 across late 2024, Llama 4 in April 2025) has shipped its own. The cards follow the structure from Mitchell et al.'s 2019 Model Cards for Model Reporting paper, and they are a primary reference for what each Llama checkpoint was trained on, what it can be used for under the Llama Community License, and what risks Meta has flagged.
The cards are also a focal point for criticism. Watchers of open source AI note that while the cards list training-data buckets and aggregate statistics, they do not enumerate the specific datasets, web crawls, books, or licensed corpora used. This has led the Open Source Initiative to argue that Llama is not open source in the traditional sense. The cards sit at the intersection of two debates: how much information vendors should publish about foundation models, and what counts as "open" in the era of trillion-token training runs.
The model card concept comes from Mitchell et al.'s 2019 ACM FAccT paper Model Cards for Model Reporting, which proposed a structured document with roughly 30 disclosures across nine sections: Model Details, Intended Use, Factors, Metrics, Evaluation Data, Training Data, Quantitative Analyses, Ethical Considerations, and Caveats and Recommendations. Meta's first Llama card followed this template closely, with eight sections that fold Caveats and Recommendations into Ethical Considerations. Hugging Face later codified a similar template inside its huggingface_hub library, so any model uploaded to the Hub gets a default stub with the same headings.
The first Llama card describes the original LLaMA release: an auto-regressive transformer language model in four sizes (7B, 13B, 33B, 65B), trained between December 2022 and February 2023 by Meta's FAIR team under a non-commercial bespoke license limited to approved researchers. The card frames the model as a research tool, points to the LLaMA, Open and Efficient Foundation Language Models paper by Touvron et al. for technical detail, and routes questions through the GitHub repo.
The primary intended use is research on large language models, including question answering, natural language understanding, reading comprehension, capability studies, and bias and toxicity evaluation. Primary users are researchers in natural language processing, machine learning, and artificial intelligence. The card states LLaMA is a base model that should not be deployed downstream without risk evaluation, since it has not been trained with human feedback and can generate toxic, offensive, incorrect, or unhelpful content. Language is the main performance factor: 20 languages appear in the training data but English dominates. Bias was evaluated on Responsible AI (RAI) datasets covering gender, religion, race, sexual orientation, age, nationality, disability, physical appearance, and socioeconomic status.
Metrics include accuracy on common-sense reasoning, reading comprehension, and MMLU; exact match on question answering; and toxicity scores from the Perspective API. Only one model of each size was trained, so pretraining variability is not quantified. Benchmarks include BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, ARC, OpenBookQA, NaturalQuestions, TriviaQA, RACE, MMLU, BIG-bench hard, GSM8K, RealToxicityPrompts, WinoGender, and CrowS-Pairs. The training mix is CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), Stack Exchange (2%), with Wikipedia and Books covering 20 languages.
Hyperparameters for the model architecture:
| Parameters | Dimension | n heads | n layers | Learning rate | Batch size | n tokens |
|---|---|---|---|---|---|---|
| 7B | 4,096 | 32 | 32 | 3.0E-04 | 4M | 1T |
| 13B | 5,120 | 40 | 40 | 3.0E-04 | 4M | 1T |
| 33B | 6,656 | 52 | 60 | 1.5E-04 | 4M | 1.4T |
| 65B | 8,192 | 64 | 80 | 1.5E-04 | 4M | 1.4T |
Results on eight standard common-sense reasoning benchmarks:
| Parameters | BoolQ | PIQA | SIQA | HellaSwag | WinoGrande | ARC-e | ARC-c | OBQA | COPA |
|---|---|---|---|---|---|---|---|---|---|
| 7B | 76.5 | 79.8 | 48.9 | 76.1 | 70.1 | 76.7 | 47.6 | 57.2 | 93 |
| 13B | 78.1 | 80.1 | 50.4 | 79.2 | 73.0 | 78.1 | 52.7 | 56.4 | 94 |
| 33B | 83.1 | 82.3 | 50.4 | 82.8 | 76.0 | 81.4 | 57.8 | 58.6 | 92 |
| 65B | 85.3 | 82.8 | 52.3 | 84.2 | 77.0 | 81.5 | 56.0 | 60.2 | 94 |
Results on bias evaluation (lower is better):
| No | Category | FAIR LLM |
|---|---|---|
| 1 | Gender | 70.6 |
| 2 | Religion | 79.0 |
| 3 | Race/Color | 57.0 |
| 4 | Sexual orientation | 81.0 |
| 5 | Age | 70.1 |
| 6 | Nationality | 64.2 |
| 7 | Disability | 66.7 |
| 8 | Physical appearance | 77.8 |
| 9 | Socioeconomic status | 71.5 |
| LLaMA average | 66.6 |
The card flags training data as a source of bias and states the model is not intended to inform decisions about matters central to human life. Mitigations include filtering web data based on proximity to Wikipedia text and references, using a Kneser-Ney language model and a fastText linear classifier. The card warns about hallucinations and harmful generations, and lists misinformation generation as a fraught use case.
Published on 18 July 2023, this card marks the first major scope change: Llama 2 shipped under a custom commercial license, so the card had to address commercial deployment. It documents three pretrained sizes (7B, 13B, 70B) and three corresponding Llama 2 Chat variants fine-tuned for dialogue with supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF).
Key numbers:
| Field | Value |
|---|---|
| Sizes | 7B, 13B, 70B |
| Architecture | Auto-regressive transformer with grouped-query attention on 70B |
| Pretraining tokens | 2.0 trillion |
| Context length | 4,096 tokens |
| Pretraining cutoff | September 2022 |
| Fine-tuning data cutoff | up to July 2023 |
| Training period | January 2023 through July 2023 |
| Fine-tuning examples | over 1 million human-annotated examples |
| License | Llama 2 Community License (commercial use, with 700M MAU clause) |
The card describes the pretraining mix only as "a new mix of publicly available online data" without naming datasets, and emphasizes that "neither the pretraining nor the fine-tuning datasets include Meta user data." This phrasing widens the gap between the named buckets in the Llama 1 card and the high-level description in Llama 2 and has been cited repeatedly in coverage and lawsuits. Llama 2 is also the first card with a Hardware and Software section: pretraining used Meta's Research Super Cluster and production clusters with NVIDIA A100-80GB GPUs at 350W to 400W TDP, totalling roughly 3.31 million GPU hours and 539 tCO2eq location-based emissions, which Meta says were 100% offset.
Llama 3 (18 April 2024) covers 8B and 70B pretrained and instruction-tuned variants with grouped-query attention, an 8,192-token context, a 128,000-token vocabulary, and pretraining on "over 15 trillion tokens of data from publicly available sources." The 8B has a March 2023 cutoff and the 70B December 2023, with 1.3M and 6.4M H100-80GB GPU hours and combined emissions of 2,290 tCO2eq location-based, 0 market-based.
The Llama 3.1 card (23 July 2024) adds the 405B variant. It records a 128k context, December 2023 cutoff, and 39.3M H100-80GB GPU hours total (1.46M / 7.0M / 30.84M). Emissions: 11,390 tCO2eq location-based, 0 market-based. Eight supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, Thai.
Llama 3.2 (24 October 2024) splits into a text card for the 1B (1.23B) and 3B (3.21B) on-device models and MODEL_CARD_VISION.md for the 11B and 90B vision models. The text card highlights quantized variants (SpinQuant and QLoRA) that cut memory 39% to 49% and double Android prefill speed, with up to 9 trillion training tokens.
Llama 3.3 (6 December 2024) documents a single 70B Instruct checkpoint with the same 15T mix, 128k context, and December 2023 cutoff as 3.1, but with fine-tuning gains: HumanEval 80.5 to 88.4 pass@1, MATH 68.0 to 77.0, IFEval 87.5 to 92.1, MGSM 86.9 to 91.1.
A quick comparison across the family:
| Card | Release | Sizes | Context | Pretraining tokens | Knowledge cutoff |
|---|---|---|---|---|---|
| Llama 1 | Feb 2023 | 7B, 13B, 33B, 65B | 2,048 | 1.0T to 1.4T | early 2023 |
| Llama 2 | Jul 2023 | 7B, 13B, 70B | 4,096 | 2.0T | Sep 2022 |
| Llama 3 | Apr 2024 | 8B, 70B | 8,192 | 15T+ | Mar 2023 / Dec 2023 |
| Llama 3.1 | Jul 2024 | 8B, 70B, 405B | 128,000 | 15T+ | Dec 2023 |
| Llama 3.2 | Oct 2024 | 1B, 3B, 11B-V, 90B-V | 128,000 | up to 9T | Dec 2023 |
| Llama 3.3 | Dec 2024 | 70B Instruct | 128,000 | ~15T | Dec 2023 |
| Llama 4 | Apr 2025 | Scout (17B/16E), Maverick (17B/128E) | 10M / 1M | ~22T to 40T | Aug 2024 |
Dated 5 April 2025, the Llama 4 card documents two checkpoints: Scout (17B activated parameters across 16 experts, 109B total) and Maverick (17B activated parameters across 128 experts, 400B total). Both use a mixture-of-experts (MoE) architecture with "early fusion for native multimodality," interleaving image and text tokens at the input layer.
The card lists 12 supported languages (Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, Vietnamese), an August 2024 cutoff, and context windows of 10M tokens (Scout) and 1M (Maverick). Pretraining: ~40T tokens for Scout, ~22T for Maverick. Combined compute is 7.38M H100-80GB GPU hours and an estimated 1,999 tCO2eq location-based. Outputs are restricted to multilingual text and code; image generation is not supported. The card describes a third model, Behemoth (reported at 288B activated, ~2T total, 30T+ training tokens), as a codistillation teacher; as of mid-2026 Meta has not shipped a Behemoth card with public weights.
Every Llama card keeps a recognizable skeleton:
| Section | What it covers |
|---|---|
| Model details | Developer, version, dates, architecture, sizes, license, contact |
| Intended use | Primary uses, primary users, out-of-scope deployments |
| Hardware and software | Training infrastructure, GPU hours, energy, carbon emissions |
| Training data | Source mix, token count, knowledge cutoff, language coverage |
| Benchmarks and evaluation | Standard tasks, instruction-tuned scores, safety evaluations |
| Responsibility and safety | Llama Guard, Prompt Guard, Code Shield, red-teaming |
| Ethical considerations | Known biases, hallucinations, mitigation guidance |
| Citation | Suggested BibTeX entry |
From Llama 3 onward the cards point to Meta's Purple Llama project, which contains companion safety models, and state that Llama "is not designed to be deployed in isolation."
From Llama 2 forward, every card links to a Llama Community License Agreement, permissive for commercial and research use with two notable exceptions: a 700-million-monthly-active-user threshold above which the licensee must request a separate license from Meta, and a clause prohibiting use of Llama outputs to train other LLMs. The cards also link to an Acceptable Use Policy that prohibits use for weapons development, CSAM, election manipulation, unauthorized practice of regulated professions, and circumvention of Meta's safety measures.
The cards have been welcomed for following the Mitchell et al. structure and publishing concrete numbers (GPU hours, emissions, benchmark scores) that many vendors keep internal. They have also drawn criticism. From Llama 2 onward, the cards describe the training mix as "publicly available online data" without listing specific datasets. The Open Source Initiative's 2024 Open Source AI Definition requires training data documentation; the OSI states that Llama is not open source by that standard. The 2024 Kadrey v. Meta lawsuit alleges Meta used a copy of the LibGen shadow library and stripped copyright headers, an accusation the cards do not address. The Free Software Foundation and several legal commentators argue that the Llama Community License is source-available, not an open source license, because of the 700M MAU and competitor-restriction clauses.
At Llama 4 launch, The Verge and TechCrunch reported that the version submitted to LMSYS Chatbot Arena was an "experimental chat" build optimized for human preference scoring, not the released weights, putting cards as verification documents under scrutiny. The cards have nonetheless been used as templates by other labs (Mistral, DeepSeek, and several Chinese open-weight projects ship cards in the Llama format) and function as a de facto baseline for frontier model disclosures. Each card sits inside a broader stack of research paper, responsible use guide, acceptable use policy, community license, and the Purple Llama repository. Hugging Face also generates a derived card on each repository (for example, meta-llama/Llama-3.1-8B-Instruct) that copies the GitHub card and adds platform metadata; these derived cards are what most developers see in practice.