Llama 3.2

Llama 3.2
Developer	Meta
Release date	September 25, 2024
Announced at	Meta Connect 2024
Model sizes	1B, 3B (text-only); 11B, 90B (vision)
Architecture	Auto-regressive transformer; vision models add a cross-attention image adapter on frozen Llama 3.1 backbones
Context length	128,000 tokens
Training data cutoff	December 2023
Modalities	Text (1B, 3B); Text + Image (11B, 90B)
License	Llama 3.2 Community License (EU restriction on vision models)
Predecessor	Llama 3.1
Successor	Llama 3.3 (Dec 2024, 70B refresh); Llama 4 (Apr 2025)
Companion release	Llama Stack, Llama Guard 3
Website	llama.com

Llama 3.2 is a family of open-weight large language models developed by Meta and released on September 25, 2024, at the company's annual Meta Connect developer conference.[^1] The release expanded the Llama family along two new axes simultaneously: downward to lightweight 1 billion and 3 billion parameter text-only models designed for on-device AI inference on smartphones and edge hardware, and outward into multimodal territory with 11 billion and 90 billion parameter vision-capable models — the first Llama models able to process images.[^1][^2]

The two families were engineered with different methods and toward different deployment targets. The 1B and 3B text models were derived from Llama 3.1 8B via structured pruning followed by knowledge distillation from Llama 3.1 8B and 70B teachers, producing compact models that nevertheless retain the 128,000-token context window of their larger ancestors.[^1][^3] The 11B and 90B vision models, by contrast, were built by attaching a separately trained image encoder to frozen Llama 3.1 8B and 70B language backbones via cross-attention layers, rather than retraining the language model end-to-end on multimodal data.[^1][^4]

Alongside the four core models, Meta released Llama Stack, an official agent, RAG, and evaluation framework, and updated its safety classifier family with Llama Guard 3 1B and Llama Guard 3 11B Vision.[^1][^5] The 11B and 90B vision models were released globally with one notable exception: Meta excluded developers domiciled or headquartered in the European Union from the license grant for the multimodal models, citing the regulatory environment around the EU AI Act and ongoing GDPR disputes over training data.[^6][^7] All four models were made available through llama.com, Hugging Face, and over 25 cloud and infrastructure partners on day one.[^1]

Background

Llama lineage up to 3.2

Meta's open-weight language model strategy began with the original LLaMA in February 2023 and accelerated rapidly through 2024. LLaMA 3 (April 2024) introduced the 8B and 70B models with a 128,000-token vocabulary and grouped-query attention across all sizes. Llama 3.1 (July 2024) extended the family to a 405-billion-parameter flagship, raised the context window to 128,000 tokens across all sizes, added native tool calling, and shipped under a revised community license intended for agentic deployments.[^8]

Despite these advances, the Llama 3 and 3.1 families remained text-only. Competing models from Anthropic, Google DeepMind, and OpenAI — including Claude 3 Haiku, Gemini 1.5, and GPT-4o-mini — had all incorporated vision input, leaving a perceived gap in the open-weight ecosystem.[^9] Separately, the rise of small high-performing models such as Gemma 2 2B and Phi-3 mini from competitors had made it clear that sub-10-billion-parameter LLMs were a strategically important category for on-device and mobile deployments.[^10]

Meta Connect 2024 launch

Meta Connect is Meta's annual developer and consumer hardware conference, historically anchored to virtual and augmented reality announcements and the Meta Quest product line. On September 25, 2024, CEO Mark Zuckerberg used the keynote to introduce Llama 3.2 alongside updates to Meta AI, the Ray-Ban Meta smart glasses, and Quest hardware.[^11] The choice of venue underscored Meta's positioning of the lightweight Llama 3.2 models as on-device components for consumer hardware, including AI features running locally on smart glasses and headsets rather than in the cloud.[^1][^11]

The release filled two gaps in Meta's open-weight lineup simultaneously: the absence of competitive small models for edge deployment, and the absence of any vision capability anywhere in the Llama family.

Lightweight text variants: 1B and 3B

Design goals

The 1B and 3B models were designed from the outset for scenarios in which compute, memory, and power constraints dominate the deployment environment: smartphones, edge servers, IoT devices, and offline-capable applications.[^1] This focus shaped every aspect of their development, from architectural inheritance to training methodology to the quantized formats released at launch.

Meta highlighted on-device summarization, instruction following, prompt rewriting, and local tool use as the canonical applications for these models.[^1] The goal was to enable practical, private AI experiences that run entirely on the user's device without round-tripping to a server.

Architecture

Both the 1B and 3B models share the transformer architecture of Llama 3.1 8B, including grouped-query attention for efficient key-value cache management, shared input and output embeddings, and the 128,000-token tiktoken-based BPE vocabulary.[^1][^3] The primary difference from the 8B base model is parameter count — 1.23 billion and 3.21 billion respectively — achieved through structured pruning rather than independent architecture design.[^3]

Notably, the 128,000-token context window is preserved at both sizes. Most small open models in the 1B–3B range impose much shorter context limits because the quadratic scaling of attention computation becomes burdensome at small scale, but the Llama 3.2 small models inherit the full long-context infrastructure of their larger ancestors.[^1][^3]

Training: pruning and distillation

Rather than training the 1B and 3B models from scratch, Meta derived them from Llama 3.1 8B using a two-stage process: structured pruning followed by knowledge distillation.[^1][^3]

In the pruning stage, structured portions of the 8B model — entire attention heads, feed-forward neurons, and transformer layers — were systematically removed to reach the target parameter counts. Structured pruning produces models with regular shapes that map efficiently onto hardware accelerators without requiring sparse computation support, in contrast to unstructured pruning, which zeros out individual weights.[^3]

The pruned models exhibited degraded performance because the removed components had been carrying meaningful learned representations. To recover capability, Meta applied knowledge distillation: the smaller "student" model was trained to match the output token distributions of larger "teacher" models rather than simply fitting hard ground-truth labels. For Llama 3.2 1B and 3B, the teachers were Llama 3.1 8B and Llama 3.1 70B, whose logits at each token position served as soft targets during pretraining.[^1][^3]

Distillation was combined with standard large-scale pretraining on up to 9 trillion tokens from publicly available sources, with a December 2023 data cutoff.[^3] Post-training applied multiple rounds of supervised fine-tuning, rejection sampling, and direct preference optimization to produce instruction-following variants.[^3]

Total compute was 370,000 H100 GPU hours for the 1B model and 460,000 H100 GPU hours for the 3B model, producing 107 and 133 metric tons respectively of CO₂-equivalent on a location-based accounting basis, offset to zero on a market-based basis through Meta's renewable energy purchasing.[^3]

On-device deployment and partners

The 1B and 3B models were released with explicit hardware partner support for on-device deployment. Meta announced day-one optimization in collaboration with Arm, Qualcomm, and MediaTek, with the ARM architecture targets covering approximately 99% of mobile devices in active use.[^1] The companion PyTorch ExecuTorch runtime was positioned as the primary path for on-device inference, alongside Ollama for single-node deployment.[^1]

To support deployment across the broadest possible range of hardware, Meta released official quantized variants of both models at launch:

BF16: full-precision baseline
Vanilla PTQ: standard post-training quantization
SpinQuant: rotation-based quantization with GPTQ optimization, applied with 100 iterations and 800 WikiText2 calibration samples
QLoRA: quantization-aware low-rank adaptation combining a BF16 backbone with LoRA adapters[^1][^3]

Linear layers in the quantized variants use 4-bit groupwise weights with group size 32 combined with 8-bit per-token dynamic activations. The classification layer uses 8-bit per-channel weights, and the embedding layer uses 8-bit per-channel quantization.[^3]

On a OnePlus 12 smartphone, Meta reported that the 1B model in SpinQuant form achieves approximately 50 decode tokens per second with a time-to-first-token of 0.3 seconds and a memory footprint of about 1,921 MB. The same model at BF16 precision runs at 19.2 tokens per second with a memory footprint over 3 GB, illustrating the practical importance of quantization for mobile deployment.[^1]

Capabilities

The 1B and 3B instruction-tuned models support multilingual text generation in English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.[^1] Both support zero-shot tool calling with user-defined tool specifications, enabling agentic applications without requiring tool-specific fine-tuning examples.[^1]

The 1B model can also serve as a speculative-decoding draft model for the larger Llama 3.1 8B verifier, improving end-to-end generation throughput when both models are available, since the shared vocabulary and architectural lineage make the pairing particularly effective.[^1]

Vision variants: 11B and 90B

First vision-capable Llama models

The 11B and 90B models are the first members of the Llama family to support visual input.[^1][^4] Prior Llama releases were text-only, and Meta's earlier multimodal research efforts — including ImageBind — had not been integrated into the publicly released Llama line. Llama 3.2 closed this gap and positioned the Llama ecosystem as a credible open-weight alternative to closed multimodal systems at the small-to-medium scale.[^9]

At launch, image-plus-text prompting is supported in English only, while text-only prompting on these models supports the full eight languages of the lightweight line.[^3]

Cross-attention image adapter architecture

The vision models are built by attaching a separately trained image adapter to a frozen Llama 3.1 language backbone, rather than retraining the language model on multimodal data:

Llama 3.2 11B Vision uses the Llama 3.1 8B text model as its language backbone
Llama 3.2 90B Vision uses the Llama 3.1 70B text model as its language backbone[^1][^3]

The naming reflects total parameter count including the adapter, not just the language model backbone.

The image encoder is a Vision Transformer (ViT) based on the ViT-H/14 architecture with a 14-pixel patch size. Meta extended the standard ViT-H with 8 additional gated self-attention layers, producing a final encoder with two stages: a 32-layer primary encoder followed by an 8-layer global encoder, totaling 40 transformer blocks. The encoder operates at tile sizes of 448 pixels for the 11B base model and 560 pixels for the 11B instruct and all 90B variants.[^3]

The adapter connects the image encoder to the language model through cross-attention layers inserted at regular intervals into the language model's transformer stack. Cross-attention layers are placed after every fourth self-attention layer in the language model — at layers 3, 8, 13, 18, 23, 28, 33, and 38. At these points, the language model's hidden states attend to the image encoder's output representations, allowing visual information to influence text generation without requiring every transformer layer to process image tokens.[^3]

A critical architectural decision was to freeze the language model parameters during vision adapter training. Only the image encoder and cross-attention layers were updated during the vision pre-training phase.[^1][^3] This design has two important practical consequences:

Preserved text capabilities: the underlying Llama 3.1 text behavior is left intact, making the vision models true drop-in replacements for their text-only counterparts on pure-text workflows.[^1]
Strong starting points for fine-tuning: downstream developers can fine-tune on domain-specific visual tasks without the language model parameters having been perturbed during multimodal training.[^1]

Inputs to the vision models consist of image plus text; outputs are text only. Multi-image conversations are supported, but the models are optimized to attend to the most recently provided image when several appear in a context window.[^3]

Training

Vision pre-training used a dataset of approximately 6 billion image-text pairs, structured in multiple stages with the vision adapter components trained while language model weights remained frozen.[^3] Post-training applied multiple rounds of supervised fine-tuning, rejection sampling, and direct preference optimization to optimize instruction following, safety behavior, and multimodal dialogue quality. Instruction fine-tuning data included over 3 million synthetically generated image-text examples.[^3]

Total compute for combined 11B and 90B vision training was approximately 2.02 million H100-80GB GPU hours, producing 584 metric tons of CO₂-equivalent on a location-based basis, again offset to zero on a market-based basis.[^3]

Supported visual tasks

The 11B and 90B vision models support a broad range of image-understanding tasks:

Visual question answering over photos, screenshots, and other natural images
Document understanding of charts, graphs, tables, and scanned text without OCR preprocessing
Image captioning in natural language
Visual grounding, pinpointing regions of an image referenced by a text query
Multimodal reasoning with chain-of-thought across image and text context
Multi-turn dialogue including images interleaved with text turns[^1][^3]

Images are processed by dividing them into overlapping tiles at the configured pixel size, with each tile encoded independently and the resulting representations passed through the global encoder stage before cross-attention integration with the language model.[^3]

Training methodology

Across both families, Llama 3.2 follows the post-training methodology established by Llama 3.1, with refinements specific to the small-model and multimodal cases.[^3][^8]

Pretraining. The 1B and 3B models combine standard next-token prediction on up to 9 trillion publicly available tokens with knowledge-distillation losses against Llama 3.1 8B and 70B teachers. Vision models layer image-text pretraining on top of frozen Llama 3.1 backbones using 6 billion image-text pairs.[^3]

Supervised fine-tuning (SFT). Each model undergoes SFT on curated instruction-following data; vision models additionally see synthetic image-text instructions generated by larger Meta models.[^3]

Rejection sampling. Multiple completions are sampled per prompt; only those scoring above a quality threshold under reward and safety models are retained as additional SFT data.[^3]

Direct preference optimization (DPO). Pairwise preference data is used to align outputs with human preferences without requiring a separate reward model in the loss, a technique Meta adopted across the Llama 3 generation.[^3][^8]

Safety post-training. Each model receives an additional safety alignment pass to reduce harmful outputs, complemented at deployment time by the Llama Guard 3 classifier family running externally.[^1]

Llama Stack

Released alongside the core models on September 25, 2024, Llama Stack is Meta's official framework for building agentic and retrieval-augmented applications on Llama models.[^1][^12] First proposed as an RFC in July 2024, Llama Stack standardizes APIs for inference, safety, memory, tool use, agentic execution, telemetry, and evaluation.[^12]

Key elements of the launch release:

Standardized APIs. A unified surface for inference, tool calling, RAG pipelines, safety filtering, evaluations, and agent orchestration, intended to make Llama-based applications portable across single-node, on-premises, cloud, and on-device environments.[^1][^12]
Reference implementation. Meta open-sourced a reference set of API providers that work together to provide a single endpoint for developers, plus example agentic applications.[^12]
Partner distributions. Meta built Llama Stack distributions with AWS, Databricks, Dell Technologies, Fireworks, Infosys, and Together AI for downstream enterprise customers.[^12]
Built-in RAG and safety. Llama Stack distributions ship with integrated retrieval-augmented generation and safety filtering, intended for turnkey deployment.[^1]

Llama Stack reached its first stable release in January 2025, by which time it had become Meta's recommended path for production agentic applications on Llama.[^13]

Llama Guard 3

Meta released two updated members of its Llama Guard safety-classification family with Llama 3.2.[^1] Unlike the base Llama models, Llama Guard models are designed not to generate content but to classify whether a prompt or response violates defined safety categories, functioning as a separately deployed safety layer alongside the model being guarded.

Llama Guard 3 1B is a text-safety classifier derived from Llama 3.2 1B through additional pruning and quantization. In its quantized form, the model weighs approximately 438 MB, down from the 2,858 MB BF16 parent.[^3] This extreme compression makes it practical to run Llama Guard on the same device as the model being guarded.

Llama Guard 3 11B Vision is a multimodal safety classifier based on the Llama 3.2 11B Vision architecture. It accepts both image and text inputs and classifies content across 13 hazard categories drawn from the MLCommons safety taxonomy, including violent crimes, child sexual exploitation, defamation, hate speech, suicide and self-harm, and elections.[^14] Images are rescaled into four 560-by-560 pixel chunks before encoding. Meta reported Llama Guard 3 11B Vision achieving an F1 score of 0.938 on response classification, substantially outperforming GPT-4o at 0.667 F1 on the same task.[^14]

Benchmarks

Small text models

Llama 3.2 1B and 3B were benchmarked against contemporary small open models including Gemma 2 2B and Phi-3 mini variants. Selected results for instruction-tuned models:[^1][^3]

Benchmark	Llama 3.2 1B	Llama 3.2 3B	Llama 3.1 8B
MMLU (5-shot)	49.3	63.4	69.4
GSM8K (CoT)	44.4	77.7	84.5
MATH (CoT)	—	48.0	51.9
ARC-C	59.4	78.6	83.4
IFEval	59.5	77.4	76.5
Hellaswag	41.2	69.8	82.0

The 3B model's IFEval score of 77.4 essentially matches the 8B model's 76.5, a result Meta highlighted as evidence that pruning plus distillation transfers instruction-following capability beyond what raw parameter count would predict.[^1] Public comparisons against Gemma 2 2B IT (IFEval 61.9) and Phi-3.5-mini IT (IFEval 59.2) showed the Llama 3.2 3B leading on instruction following, while Phi-3.5-mini retained an edge on math benchmarks such as GSM8K.[^15]

Vision models

Selected results for the instruction-tuned vision models:[^3]

Benchmark	Metric	Llama 3.2 11B	Llama 3.2 90B
VQAv2 (test)	Accuracy	75.2%	78.1%
DocVQA (test)	ANLS	88.4	90.1
ChartQA (test, CoT)	Relaxed accuracy	83.4%	85.5%
AI2 Diagram (test)	Accuracy	91.1%	92.3%
MMMU (val, CoT)	Micro avg	50.7%	60.3%
MathVista	Accuracy	—	57.3%
MMLU (CoT)	Macro avg	73.0%	86.0%
MATH (CoT)	Final EM	51.9%	68.0%

Meta positioned the 11B and 90B models as competitive with Claude 3 Haiku and GPT-4o-mini on image recognition and visual understanding tasks.[^1] On visual QA and document understanding tasks, the 90B model performed comparably to or slightly above Claude 3 Haiku, while the 11B model was roughly on par with Haiku depending on the benchmark. GPT-4o-mini retained a lead on broad multidisciplinary reasoning benchmarks such as MMMU.[^1][^9]

License and EU restrictions

Llama 3.2 Community License

Llama 3.2 is released under the Llama 3.2 Community License, a custom commercial license distinct from standard open-source licenses.[^16] Its core terms continue the Llama 3.1 framework: permits commercial use, fine-tuning, and redistribution, but requires a "Built with Llama" attribution on derived products and a "Llama" prefix on the names of derivative models. Distributions must include the full license text, and organizations whose Llama-powered products exceed 700 million monthly active users must seek an explicit commercial license from Meta.[^16]

European Union restriction on vision models

The Llama 3.2 Community License contains a restriction that applies exclusively to the 11B and 90B multimodal models and not to the 1B and 3B text-only models. The acceptable-use policy states that the rights granted under the agreement "are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union."[^6][^17]

An exception preserves access for end users: European individuals can still use Llama 3.2 Vision through third-party products and services that incorporate the models, even when those services are built by non-EU entities.[^17] The restriction applies to developers and companies seeking to build with the vision models directly, not to consumers using a product that embeds them.

Meta has not published a fully detailed legal justification, but the company has publicly cited the "unpredictable" nature of European AI regulation and the EU AI Act in particular.[^6][^7] Independent reporting has identified two intertwined causes:

EU AI Act uncertainty. The EU AI Act, signed into law in August 2024 and entering force in stages, introduced new obligations for general-purpose AI models and "high-risk" systems. Multimodal foundation models faced unsettled compliance obligations at the time of Llama 3.2's release.[^7][^18]
GDPR disputes over training data. Data-protection authorities in 11 EU member states — including Ireland, where Meta's European headquarters sits — had previously objected to Meta's use of public EU user data for AI training, leading Meta to pause certain training plans in mid-2024.[^7][^18] The EU restriction on vision models is widely interpreted as reflecting Meta's caution about deploying multimodal models trained on data whose collection practices could face further EU regulatory scrutiny.[^7]

Critics have questioned whether the regulatory justification is sufficient. Pixtral 12B from Mistral AI (an EU company) and Qwen vision models from Alibaba were released globally without equivalent EU restrictions in roughly the same window, which has led some observers to suggest that the restriction reflects litigation risk concerns specific to Meta's data practices rather than a general requirement of EU AI regulation.[^6][^7]

The EU vision restriction would later set the template for Meta's release pattern: the company's Llama 4 multimodal models, released in April 2025, were similarly excluded from EU developer access, while the text-only Llama 3.3 (December 2024) was released globally without restriction.[^7][^19]

Reception and impact

The Llama 3.2 release received broadly positive coverage from the AI developer community. The introduction of vision capabilities into the Llama family was described by multiple commentators as a significant milestone for open-weight AI, bringing Meta's public releases into approximate parity with closed multimodal systems at the small-to-medium scale for the first time.[^9][^20]

The lightweight line attracted particular enthusiasm. The combination of a 128K-token context window, competitive instruction-following performance, and broad hardware support made the 1B and 3B models highly practical for developers who had previously needed to use much larger or cloud-dependent models for tasks like summarization and tool use. InfoQ and other outlets singled out the 3B model's IFEval performance matching the 8B model as evidence that distillation had closed the expected capability gap between size tiers.[^20]

The 3B model accumulated approximately 2.27 million monthly downloads on Hugging Face within months of release, reflecting strong adoption by developers building mobile and edge applications.[^21]

Partner companies including AMD, AWS, Databricks, Dell, Google Cloud, Groq, IBM, Intel, Microsoft Azure, NVIDIA, Oracle Cloud, and Snowflake launched same-day support for the models, indicating strong advance coordination and ecosystem readiness.[^1]

The EU vision restriction generated criticism from the European AI developer community and was widely reported in AI media outlets including Slator, Silicon Republic, and DeepLearning.AI's The Batch.[^6][^7][^18] Some developers also noted that despite strong benchmark performance relative to Claude 3 Haiku, the 90B vision model still lagged behind GPT-4o (not GPT-4o-mini) and Claude 3.5 Sonnet on more complex multimodal reasoning tasks, limiting its applicability as a drop-in replacement for the most capable proprietary systems.[^9]

Successors

Llama 3.3 (December 2024)

Llama 3.3 was released on December 6, 2024, as a single 70B instruction-tuned text-only refresh.[^22] Meta positioned it as delivering 405B-class instruction-following and multilingual quality at roughly the inference cost of a 70B model: IFEval 92.1% and MATH 77.0%, comparable to or exceeding Llama 3.1 405B on those benchmarks. Llama 3.3 retained the 128K context window, the eight-language coverage, and the December 2023 data cutoff. Because Llama 3.3 was text-only, it was released globally without an EU restriction.[^22]

Llama 4 (April 2025)

Llama 4 was released on April 5, 2025, marking Meta's shift to a natively multimodal, mixture-of-experts architecture rather than the cross-attention adapter approach pioneered in Llama 3.2.[^19] The initial release included Llama 4 Scout (17B active parameters, 16 experts) and Llama 4 Maverick (17B active parameters, 128 experts), with a much larger Llama 4 Behemoth model announced as a "teacher" model still in training. Like the Llama 3.2 vision models, Llama 4 was excluded from EU developer access at launch on similar regulatory grounds.[^7][^19]

The progression from Llama 3.2 to Llama 4 represents a substantive architectural turn: Llama 3.2 used a cross-attention bolt-on to add vision to text-trained backbones, while Llama 4 was trained from the start to be multimodal, using MoE rather than dense architectures and supporting text, image, and video input in a unified model.[^19]

Legacy and current status

Llama 3.2 occupies a distinctive position in the Llama lineage as the release that simultaneously opened two major new fronts for Meta's open-weight strategy: lightweight on-device deployment and multimodal vision. Although it has been superseded for top-end quality by Llama 3.3 (text-only) and Llama 4 (natively multimodal MoE), Llama 3.2 remains heavily used in three contexts:[^22][^19]

On-device and mobile applications, where the 1B and 3B models continue to be the default small Llama options in production, with the quantized formats supplied at launch underpinning real shipping mobile features.
Multimodal experimentation in non-EU jurisdictions, where the 11B and 90B vision models remain among the few large open-weight options with a permissive license and credible vision-language benchmarks.
Speculative decoding pairings, where the 1B model serves as a draft model for the larger 8B verifier.

The release also established several patterns that have persisted in Meta's later Llama releases: a strict separation between text-only and multimodal license terms, a coordinated cloud and hardware partner ecosystem on launch day, a dedicated agent and RAG framework (Llama Stack) shipping alongside the core models, and the routine release of matching safety classifiers in the Llama Guard line.[^1][^12]

References

Llama 3.2

Background

Llama lineage up to 3.2

Meta Connect 2024 launch

Lightweight text variants: 1B and 3B

Design goals

Architecture

Training: pruning and distillation

On-device deployment and partners

Capabilities

Vision variants: 11B and 90B

First vision-capable Llama models

Cross-attention image adapter architecture

Training

Supported visual tasks

Training methodology

Llama Stack

Llama Guard 3

Benchmarks

Small text models

Vision models

License and EU restrictions

Llama 3.2 Community License

European Union restriction on vision models

Reception and impact

Successors

Llama 3.3 (December 2024)

Llama 4 (April 2025)

Legacy and current status

See also

References

Improve this article

Related Articles

Llama 4 Scout and Maverick

Llama 3.3

Llama 3.1

Llama 4 Behemoth

DeepSeek 3.0

OpenClaw

Llama 3.2

Background

Llama lineage up to 3.2

Meta Connect 2024 launch

Lightweight text variants: 1B and 3B

Design goals

Architecture

Training: pruning and distillation

On-device deployment and partners

Capabilities

Vision variants: 11B and 90B

First vision-capable Llama models

Cross-attention image adapter architecture

Training

Supported visual tasks

Training methodology

Llama Stack

Llama Guard 3

Benchmarks

Small text models

Vision models

License and EU restrictions

Llama 3.2 Community License

European Union restriction on vision models

Reception and impact

Successors

Llama 3.3 (December 2024)

Llama 4 (April 2025)

Legacy and current status

See also

References

Related Articles

Llama 4 Scout and Maverick

Llama 3.3

Llama 3.1

Llama 4 Behemoth

DeepSeek 3.0

OpenClaw