# LLaMA/Model Card

> Source: https://aiwiki.ai/wiki/llama_model_card
> Updated: 2026-05-10
> Categories: AI Models, Developer Tools, Large Language Models, Meta AI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [LLaMA](/wiki/llama), [Llama 2](/wiki/llama_2), [Llama 3](/wiki/llama_3), [Llama 4](/wiki/llama_4), [Model card](/wiki/model_card)*

The Llama model card is the official documentation that [Meta AI](/wiki/meta_ai) ships with each release of the [Llama](/wiki/llama) family of [large language models](/wiki/large_language_model). The first card appeared in February 2023 inside `facebookresearch/llama` as `MODEL_CARD.md`, and every subsequent generation (Llama 2 in July 2023, Llama 3 in April 2024, Llama 3.1, 3.2, and 3.3 across late 2024, Llama 4 in April 2025) has shipped its own. The cards follow the structure from Mitchell et al.'s 2019 *Model Cards for Model Reporting* paper, and they are a primary reference for what each Llama checkpoint was trained on, what it can be used for under the [Llama Community License](/wiki/llama_community_license), and what risks Meta has flagged.

The cards are also a focal point for criticism. Watchers of [open source AI](/wiki/open_source_ai) note that while the cards list training-data buckets and aggregate statistics, they do not enumerate the specific datasets, web crawls, books, or licensed corpora used. This has led the [Open Source Initiative](/wiki/open_source_initiative) to argue that Llama is not open source in the traditional sense. The cards sit at the intersection of two debates: how much information vendors should publish about [foundation models](/wiki/foundation_model), and what counts as "open" in the era of trillion-token training runs.

## Background and framework

The model card concept comes from Mitchell et al.'s 2019 ACM FAccT paper *Model Cards for Model Reporting*, which proposed a structured document with roughly 30 disclosures across nine sections: Model Details, Intended Use, Factors, Metrics, Evaluation Data, Training Data, Quantitative Analyses, Ethical Considerations, and Caveats and Recommendations. Meta's first Llama card followed this template closely, with eight sections that fold Caveats and Recommendations into Ethical Considerations. Hugging Face later codified a similar template inside its `huggingface_hub` library, so any model uploaded to the Hub gets a default stub with the same headings.

## Original Llama card (February 2023)

The first Llama card describes the original [LLaMA](/wiki/llama) release: an [auto-regressive](/wiki/autoregressive_model) [transformer](/wiki/transformer) [language model](/wiki/language_model) in four sizes (7B, 13B, 33B, 65B), trained between December 2022 and February 2023 by Meta's [FAIR](/wiki/fair) team under a non-commercial bespoke license limited to approved researchers. The card frames the model as a research tool, points to the *LLaMA, Open and Efficient Foundation Language Models* paper by Touvron et al. for technical detail, and routes questions through the GitHub repo.

### Intended use and factors

The primary intended use is research on large language models, including [question answering](/wiki/question_answering), [natural language understanding](/wiki/natural_language_understanding), reading comprehension, capability studies, and bias and toxicity evaluation. Primary users are researchers in [natural language processing](/wiki/natural_language_processing), [machine learning](/wiki/machine_learning), and [artificial intelligence](/wiki/artificial_intelligence). The card states LLaMA is a base model that should not be deployed downstream without risk evaluation, since it has not been trained with [human feedback](/wiki/rlhf) and can generate toxic, offensive, incorrect, or unhelpful content. Language is the main performance factor: 20 languages appear in the training data but English dominates. Bias was evaluated on Responsible AI (RAI) datasets covering gender, religion, race, sexual orientation, age, nationality, disability, physical appearance, and socioeconomic status.

### Metrics, datasets, and training data

Metrics include accuracy on common-sense reasoning, reading comprehension, and [MMLU](/wiki/mmlu); exact match on question answering; and toxicity scores from the [Perspective API](/wiki/perspective_api). Only one model of each size was trained, so pretraining variability is not quantified. Benchmarks include BoolQ, PIQA, SIQA, [HellaSwag](/wiki/hellaswag), [WinoGrande](/wiki/winogrande), [ARC](/wiki/arc_benchmark), OpenBookQA, [NaturalQuestions](/wiki/natural_questions), [TriviaQA](/wiki/triviaqa), RACE, MMLU, BIG-bench hard, [GSM8K](/wiki/gsm8k), RealToxicityPrompts, WinoGender, and CrowS-Pairs. The training mix is CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), Stack Exchange (2%), with Wikipedia and Books covering 20 languages.

### Quantitative analysis

Hyperparameters for the model architecture:

| Parameters | Dimension | n heads | n layers | Learning rate | Batch size | n tokens |
| --- | --- | --- | --- | --- | --- | --- |
| 7B | 4,096 | 32 | 32 | 3.0E-04 | 4M | 1T |
| 13B | 5,120 | 40 | 40 | 3.0E-04 | 4M | 1T |
| 33B | 6,656 | 52 | 60 | 1.5E-04 | 4M | 1.4T |
| 65B | 8,192 | 64 | 80 | 1.5E-04 | 4M | 1.4T |

Results on eight standard common-sense reasoning benchmarks:

| Parameters | BoolQ | PIQA | SIQA | HellaSwag | WinoGrande | ARC-e | ARC-c | OBQA | COPA |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 7B | 76.5 | 79.8 | 48.9 | 76.1 | 70.1 | 76.7 | 47.6 | 57.2 | 93 |
| 13B | 78.1 | 80.1 | 50.4 | 79.2 | 73.0 | 78.1 | 52.7 | 56.4 | 94 |
| 33B | 83.1 | 82.3 | 50.4 | 82.8 | 76.0 | 81.4 | 57.8 | 58.6 | 92 |
| 65B | 85.3 | 82.8 | 52.3 | 84.2 | 77.0 | 81.5 | 56.0 | 60.2 | 94 |

Results on bias evaluation (lower is better):

| No | Category | FAIR LLM |
| --- | --- | --- |
| 1 | Gender | 70.6 |
| 2 | Religion | 79.0 |
| 3 | Race/Color | 57.0 |
| 4 | Sexual orientation | 81.0 |
| 5 | Age | 70.1 |
| 6 | Nationality | 64.2 |
| 7 | Disability | 66.7 |
| 8 | Physical appearance | 77.8 |
| 9 | Socioeconomic status | 71.5 |
|  | LLaMA average | 66.6 |

### Ethical considerations

The card flags training data as a source of bias and states the model is not intended to inform decisions about matters central to human life. Mitigations include filtering web data based on proximity to Wikipedia text and references, using a Kneser-Ney language model and a fastText linear classifier. The card warns about hallucinations and harmful generations, and lists misinformation generation as a fraught use case.

## Llama 2 model card (July 2023)

Published on 18 July 2023, this card marks the first major scope change: Llama 2 shipped under a custom commercial license, so the card had to address commercial deployment. It documents three pretrained sizes (7B, 13B, 70B) and three corresponding [Llama 2 Chat](/wiki/llama_2_chat) variants fine-tuned for dialogue with [supervised fine-tuning](/wiki/supervised_fine_tuning) (SFT) and [reinforcement learning from human feedback](/wiki/rlhf) (RLHF).

Key numbers:

| Field | Value |
| --- | --- |
| Sizes | 7B, 13B, 70B |
| Architecture | Auto-regressive transformer with grouped-query attention on 70B |
| Pretraining tokens | 2.0 trillion |
| Context length | 4,096 tokens |
| Pretraining cutoff | September 2022 |
| Fine-tuning data cutoff | up to July 2023 |
| Training period | January 2023 through July 2023 |
| Fine-tuning examples | over 1 million human-annotated examples |
| License | Llama 2 Community License (commercial use, with 700M MAU clause) |

The card describes the pretraining mix only as "a new mix of publicly available online data" without naming datasets, and emphasizes that "neither the pretraining nor the fine-tuning datasets include Meta user data." This phrasing widens the gap between the named buckets in the Llama 1 card and the high-level description in Llama 2 and has been cited repeatedly in coverage and lawsuits. Llama 2 is also the first card with a *Hardware and Software* section: pretraining used Meta's Research Super Cluster and production clusters with NVIDIA A100-80GB GPUs at 350W to 400W TDP, totalling roughly 3.31 million GPU hours and 539 tCO2eq location-based emissions, which Meta says were 100% offset.

## Llama 3 family cards (2024)

Llama 3 (18 April 2024) covers 8B and 70B pretrained and instruction-tuned variants with [grouped-query attention](/wiki/grouped_query_attention), an 8,192-token context, a 128,000-token vocabulary, and pretraining on "over 15 trillion tokens of data from publicly available sources." The 8B has a March 2023 cutoff and the 70B December 2023, with 1.3M and 6.4M H100-80GB GPU hours and combined emissions of 2,290 tCO2eq location-based, 0 market-based.

The Llama 3.1 card (23 July 2024) adds the 405B variant. It records a 128k context, December 2023 cutoff, and 39.3M H100-80GB GPU hours total (1.46M / 7.0M / 30.84M). Emissions: 11,390 tCO2eq location-based, 0 market-based. Eight supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, Thai.

Llama 3.2 (24 October 2024) splits into a text card for the 1B (1.23B) and 3B (3.21B) on-device models and `MODEL_CARD_VISION.md` for the 11B and 90B vision models. The text card highlights quantized variants (SpinQuant and QLoRA) that cut memory 39% to 49% and double Android prefill speed, with up to 9 trillion training tokens.

Llama 3.3 (6 December 2024) documents a single 70B Instruct checkpoint with the same 15T mix, 128k context, and December 2023 cutoff as 3.1, but with fine-tuning gains: [HumanEval](/wiki/humaneval) 80.5 to 88.4 pass@1, MATH 68.0 to 77.0, IFEval 87.5 to 92.1, MGSM 86.9 to 91.1.

A quick comparison across the family:

| Card | Release | Sizes | Context | Pretraining tokens | Knowledge cutoff |
| --- | --- | --- | --- | --- | --- |
| Llama 1 | Feb 2023 | 7B, 13B, 33B, 65B | 2,048 | 1.0T to 1.4T | early 2023 |
| Llama 2 | Jul 2023 | 7B, 13B, 70B | 4,096 | 2.0T | Sep 2022 |
| Llama 3 | Apr 2024 | 8B, 70B | 8,192 | 15T+ | Mar 2023 / Dec 2023 |
| Llama 3.1 | Jul 2024 | 8B, 70B, 405B | 128,000 | 15T+ | Dec 2023 |
| Llama 3.2 | Oct 2024 | 1B, 3B, 11B-V, 90B-V | 128,000 | up to 9T | Dec 2023 |
| Llama 3.3 | Dec 2024 | 70B Instruct | 128,000 | ~15T | Dec 2023 |
| Llama 4 | Apr 2025 | Scout (17B/16E), Maverick (17B/128E) | 10M / 1M | ~22T to 40T | Aug 2024 |

## Llama 4 model card (April 2025)

Dated 5 April 2025, the Llama 4 card documents two checkpoints: *Scout* (17B activated parameters across 16 experts, 109B total) and *Maverick* (17B activated parameters across 128 experts, 400B total). Both use a [mixture-of-experts](/wiki/mixture_of_experts) (MoE) architecture with "early fusion for native multimodality," interleaving image and text tokens at the input layer.

The card lists 12 supported languages (Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, Vietnamese), an August 2024 cutoff, and context windows of 10M tokens (Scout) and 1M (Maverick). Pretraining: ~40T tokens for Scout, ~22T for Maverick. Combined compute is 7.38M H100-80GB GPU hours and an estimated 1,999 tCO2eq location-based. Outputs are restricted to multilingual text and code; image generation is not supported. The card describes a third model, *Behemoth* (reported at 288B activated, ~2T total, 30T+ training tokens), as a codistillation teacher; as of mid-2026 Meta has not shipped a Behemoth card with public weights.

## Standard sections across cards

Every Llama card keeps a recognizable skeleton:

| Section | What it covers |
| --- | --- |
| Model details | Developer, version, dates, architecture, sizes, license, contact |
| Intended use | Primary uses, primary users, out-of-scope deployments |
| Hardware and software | Training infrastructure, GPU hours, energy, carbon emissions |
| Training data | Source mix, token count, knowledge cutoff, language coverage |
| Benchmarks and evaluation | Standard tasks, instruction-tuned scores, safety evaluations |
| Responsibility and safety | Llama Guard, Prompt Guard, Code Shield, red-teaming |
| Ethical considerations | Known biases, hallucinations, mitigation guidance |
| Citation | Suggested BibTeX entry |

From Llama 3 onward the cards point to Meta's [Purple Llama](/wiki/purple_llama) project, which contains companion safety models, and state that Llama "is not designed to be deployed in isolation."

## License terms in the cards

From Llama 2 forward, every card links to a Llama Community License Agreement, permissive for commercial and research use with two notable exceptions: a 700-million-monthly-active-user threshold above which the licensee must request a separate license from Meta, and a clause prohibiting use of Llama outputs to train other LLMs. The cards also link to an Acceptable Use Policy that prohibits use for weapons development, [CSAM](/wiki/csam), election manipulation, unauthorized practice of regulated professions, and circumvention of Meta's safety measures.

## Reception and criticism

The cards have been welcomed for following the Mitchell et al. structure and publishing concrete numbers (GPU hours, emissions, benchmark scores) that many vendors keep internal. They have also drawn criticism. From Llama 2 onward, the cards describe the training mix as "publicly available online data" without listing specific datasets. The [Open Source Initiative](/wiki/open_source_initiative)'s 2024 Open Source AI Definition requires training data documentation; the OSI states that Llama is not open source by that standard. The 2024 Kadrey v. Meta lawsuit alleges Meta used a copy of the LibGen [shadow library](/wiki/shadow_library) and stripped copyright headers, an accusation the cards do not address. The [Free Software Foundation](/wiki/free_software_foundation) and several legal commentators argue that the Llama Community License is source-available, not an [open source license](/wiki/open_source_license), because of the 700M MAU and competitor-restriction clauses.

At Llama 4 launch, *The Verge* and *TechCrunch* reported that the version submitted to [LMSYS Chatbot Arena](/wiki/chatbot_arena) was an "experimental chat" build optimized for human preference scoring, not the released weights, putting cards as verification documents under scrutiny. The cards have nonetheless been used as templates by other labs ([Mistral](/wiki/mistral_ai), [DeepSeek](/wiki/deepseek), and several Chinese open-weight projects ship cards in the Llama format) and function as a *de facto* baseline for frontier model disclosures. Each card sits inside a broader stack of research paper, responsible use guide, acceptable use policy, community license, and the Purple Llama repository. Hugging Face also generates a derived card on each repository (for example, `meta-llama/Llama-3.1-8B-Instruct`) that copies the GitHub card and adds platform metadata; these derived cards are what most developers see in practice.

## See also

- [Model card](/wiki/model_card)
- [LLaMA](/wiki/llama)
- [Llama 2](/wiki/llama_2), [Llama 3](/wiki/llama_3), [Llama 4](/wiki/llama_4)
- [Llama Community License](/wiki/llama_community_license)
- [Purple Llama](/wiki/purple_llama)
- [Datasheets for Datasets](/wiki/datasheets_for_datasets)
- [Foundation model](/wiki/foundation_model)
- [Open source AI](/wiki/open_source_ai)

## References

1. Touvron et al. *LLaMA: Open and Efficient Foundation Language Models*. Meta AI, Feb 2023. https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/
2. Meta AI. *LLaMA Model Card (v1)*. https://github.com/meta-llama/llama/blob/llama_v1/MODEL_CARD.md
3. Meta AI. *Llama 2 Model Card*. https://github.com/meta-llama/llama-models/blob/main/models/llama2/MODEL_CARD.md
4. Meta AI. *Meta Llama 3 Model Card*. https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md
5. Meta AI. *Llama 3.1 Model Card*. https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md
6. Meta AI. *Llama 3.2 Model Card*. https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md
7. Meta AI. *Llama 3.3 Model Card*. https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md
8. Meta AI. *Llama 4 Model Card*. https://github.com/meta-llama/llama-models/blob/main/models/llama4/MODEL_CARD.md
9. Mitchell et al. *Model Cards for Model Reporting*. FAT* 2019. https://arxiv.org/abs/1810.03993
10. Meta AI. *The Llama 4 herd*. April 2025. https://ai.meta.com/blog/llama-4-multimodal-intelligence/
11. Meta AI. *Llama Community License Agreements (2, 3.1, 4)*. https://www.llama.com/llama4/license/
12. Meta AI. *Llama Acceptable Use Policy*. https://www.llama.com/use-policy/
13. Meta AI. *Purple Llama*. https://github.com/meta-llama/PurpleLlama
14. OpenSource Connections. *Is Llama 2 open source? No.* July 2023. https://opensourceconnections.com/blog/2023/07/19/is-llama-2-open-source-no-and-perhaps-we-need-a-new-definition-of-open/
15. Open Source Initiative. *The Open Source AI Definition (OSAID) 1.0*. Oct 2024. https://opensource.org/ai
16. Wikipedia. *Llama (language model)*. https://en.wikipedia.org/wiki/Llama_(language_model)
17. Hugging Face. *meta-llama organization*. https://huggingface.co/meta-llama

