Falcon 3
Last reviewed
May 16, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 ยท 3,281 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 ยท 3,281 words
Add missing citations, update stale details, or suggest a clearer explanation.
Falcon 3 is a family of open-weight large language models released on December 17, 2024 by the Technology Innovation Institute (TII), an applied research center based in Abu Dhabi, United Arab Emirates. The release comprises five base models spanning 1B to 10B parameters, including a state space model variant built on Mamba, along with instruction-tuned and quantized derivatives. TII positioned the family as a set of small but capable language models that can run on consumer hardware while remaining competitive with larger systems such as Llama 3.1 8B and Qwen 2.5 7B on standard reasoning, math, and code benchmarks.
The release marked a significant shift in the Falcon series, which had previously emphasized very large pretrained models like the 180-billion-parameter Falcon 180B. With Falcon 3, TII moved toward the small language model segment that became commercially important during 2024, when efficient sub-13B models from Meta, Alibaba, Mistral, Google, and Microsoft demonstrated that careful training could outperform much larger systems on common tasks. According to TII, the Falcon 3 family included roughly 30 model checkpoints at launch, with base and instruct variants distributed in GGUF, GPTQ-Int4, GPTQ-Int8, AWQ, and 1.58-bit quantized formats for low-resource deployment.
Falcon 3 models are distributed under the TII Falcon-LLM License 2.0, a permissive license derived from Apache 2.0 with an acceptable use policy and additional terms for hosted commercial services. The family was trained on up to 14 trillion tokens using 1,024 NVIDIA H100 GPUs and supports four languages in its main transformer variants: English, French, Spanish, and Portuguese. A successor family, Falcon-H1, which uses a hybrid attention plus state space architecture, was released in May 2025.
The Technology Innovation Institute is part of the Advanced Technology Research Council, a research entity established by the Abu Dhabi government in 2020. TII operates a dedicated AI and Digital Science Research Center that develops the Falcon series. The lab announced the first Falcon models in 2023, including Falcon 7B, Falcon 40B, and the 180-billion-parameter Falcon 180B, which was at the time one of the largest openly released LLMs. Those early models used a permissive license derived from Apache 2.0 with an acceptable use policy.
By late 2024, the open LLM landscape had shifted. Meta released the Llama 3 series, including 8B, 70B, and 405B parameter models, while Alibaba's Qwen 2.5 family covered checkpoints from 0.5B up to 72B. Smaller, well-trained models had begun to match or exceed the performance of larger predecessors on reasoning benchmarks, and quantization research showed that 7B to 13B models could run usefully on laptops, single GPUs, and even mobile hardware. TII's prior 180B model required server-grade infrastructure to run at acceptable speeds, which limited its real-world adoption. The Falcon 3 release explicitly targeted this gap, with TII describing the family as designed to run on "light infrastructure, including laptops."
TII also released Falcon Mamba 7B in mid-2024, which was an attention-free state space model based on the original Mamba architecture. That model demonstrated that SSM-based LLMs could remain competitive with transformer designs at the 7B scale. The Falcon3-Mamba-7B-Base released in December 2024 is a continuation of that line, with additional pretraining on 1.5 trillion tokens layered on top of the original Falcon Mamba weights.
The initial release on December 17, 2024 included five base models, each with a corresponding instruction-tuned variant. The transformer-based models share a Llama-compatible architecture, which means they can be loaded by any inference engine that supports Llama-style decoder-only models, including llama.cpp, vLLM, and Ollama. The Mamba variant uses a different attention-free architecture and requires Mamba-aware runtimes.
| Model | Parameters | Architecture | Layers | Context | Vocab | Training method |
|---|---|---|---|---|---|---|
| Falcon3-1B-Base | 1B | Transformer (decoder) | 18 | 8K | 131K | Pruning + distillation from 7B |
| Falcon3-3B-Base | 3B | Transformer (decoder) | 22 | 32K | 131K | Pruning + distillation from 7B |
| Falcon3-7B-Base | 7B | Transformer (decoder) | 28 | 32K | 131K | Pretraining from scratch, 14T tokens |
| Falcon3-10B-Base | 10B | Transformer (decoder) | 40 | 32K | 131K | Depth up-scaled from 7B + 2T tokens |
| Falcon3-Mamba-7B-Base | 7B | Mamba SSM | 64 | 32K | 65K | Continued pretraining of Falcon Mamba 7B + 1.5T tokens |
The smallest model, Falcon3-1B-Base, has an 8K token context window, while the 3B, 7B, 10B transformer models and the Mamba variant all support 32K tokens. The transformer models share a vocabulary size of 131,072 tokens, while the Mamba model uses a smaller 65K vocabulary inherited from Falcon Mamba 7B. All models were released in BF16 precision, with additional GGUF, GPTQ-Int4, GPTQ-Int8, AWQ, and a 1.58-bit quantization aimed at very low memory environments.
The 10B model is notable for how it was built. Rather than train a new 10B network from scratch, TII used a technique called depth up-scaling: starting from the trained Falcon3-7B-Base, duplicating a subset of its decoder layers to expand the network from 28 to 40 blocks, then continuing pretraining on an additional 2 trillion tokens of curated data. This approach reuses the compute that went into the 7B model and avoids the cold start cost of a fresh training run.
The 1B and 3B models took the opposite path. TII produced them by pruning the 7B base model and applying knowledge distillation, using less than 100 billion tokens of curated training data on top of the pruned initialization. The pruned-and-distilled approach gave the small models stronger starting representations than a from-scratch pretrain on the same token budget would have produced.
The Falcon 3 transformer variants share a common decoder-only design with several specific choices that distinguish them from earlier Falcon models. Each transformer block uses grouped query attention (GQA) with 12 query heads and 4 key-value heads, giving a 3-to-1 query-to-KV ratio that reduces inference-time memory bandwidth without significantly hurting model quality. The attention head dimension is fixed at 256 across all transformer sizes, a choice TII says was made to enable efficient FlashAttention-3 kernels.
The models use SwiGLU activation in the feed-forward layers, rotary position embeddings (RoPE) with a high theta value of 1,000,042 to extend usable context to 32K tokens, and a tied input and output embedding scheme for the smaller variants. The 131K-token tokenizer was retrained from scratch on the Falcon 3 pretraining corpus and is shared across the transformer base models. The architecture is intentionally compatible with Llama-style implementations, which means existing Hugging Face Transformers code paths, vLLM, llama.cpp, and Ollama can serve Falcon 3 transformer checkpoints with no model-specific changes.
The Mamba variant uses a different design. Falcon3-Mamba-7B-Base is a 64-layer state space model with a hidden width of 4,096 and a state dimension of 16. It uses the original Mamba1 selective scan mechanism rather than the Mamba 2 update that appeared later in 2024. Because the SSM mechanism has constant memory and constant-time generation per token, the Mamba variant is particularly well-suited for long-context inference where transformer KV cache memory would dominate.
The Falcon3-7B-Base, which serves as the foundation for the 10B, 3B, and 1B variants, was trained on 14 trillion tokens. According to TII, this was more than double the corpus used for the previous generation of Falcon models. The 14T mix combined web data, code, STEM content, curated high-quality sources, and multilingual data covering English, French, Spanish, and Portuguese.
Training ran on 1,024 NVIDIA H100 GPUs, which TII operates at its compute facility in Abu Dhabi. The 14T-token base run for the 7B model represented the bulk of the compute spent on the family. The 10B variant required an additional 2 trillion tokens of continued pretraining on the depth-upscaled network, while the Mamba variant added 1.5 trillion tokens on top of the existing Falcon Mamba 7B checkpoint. The 1B and 3B distilled variants needed less than 100 billion tokens each because they inherited representations from the 7B teacher.
TII has not publicly disclosed the full data composition or the proportion of synthetic data in the corpus. The instruction-tuned variants were aligned using a combination of supervised fine-tuning and direct preference optimization on publicly available and custom-curated instruction datasets, with TII publishing chat templates and recommended system prompts on each model card.
TII reported benchmark numbers across the standard small LLM evaluation suite, including MMLU and MMLU-PRO for general knowledge, GSM8K and MATH for grade school and competition math, BIG-Bench Hard (BBH) for chain-of-thought reasoning, ARC Challenge and GPQA for science questions, HumanEval and MBPP for code, and IFEval for instruction following. The base model results below are taken from the official Hugging Face model cards.
| Benchmark | Falcon3-1B-Base | Falcon3-3B-Base | Falcon3-7B-Base | Falcon3-10B-Base | Falcon3-Mamba-7B-Base |
|---|---|---|---|---|---|
| MMLU (5-shot) | n/a | n/a | 67.5 | 73.1 | n/a |
| MMLU-PRO (5-shot) | n/a | 29.7 | 39.2 | 42.5 | 22.6 |
| IFEval | 54.4 | n/a | 34.3 | 36.4 | n/a |
| GSM8K (5-shot) | n/a | n/a | 76.2 | 81.4 | 65.9 |
| MATH Lvl-5 (4-shot) | n/a | 19.9 | 18.0 | 22.9 | 15.6 |
| ARC Challenge (25-shot) | n/a | n/a | 59.6 | 66.8 | n/a |
| GPQA (0-shot) | n/a | n/a | 35.5 | 34.1 | 10.6 |
| MUSR (0-shot) | 40.7 | n/a | 47.3 | 44.2 | n/a |
| BBH (3-shot) | n/a | n/a | 51.0 | 59.7 | n/a |
| PIQA (0-shot) | n/a | n/a | 77.7 | 79.4 | n/a |
| SciQ (0-shot) | 86.8 | n/a | 95.3 | 93.5 | n/a |
| Winogrande (0-shot) | n/a | n/a | 71.0 | 73.6 | n/a |
| OpenbookQA (0-shot) | n/a | n/a | 31.4 | 45.0 | n/a |
The instruction-tuned versions improve on the base model numbers in instruction-following-heavy tasks. Falcon3-10B-Instruct reaches 78 on IFEval and 83.1 on GSM8K, the latter a roughly 2-point gain over the 10B base. On the Hugging Face Open LLM Leaderboard v2, which weights IFEval, BBH, MATH Level 5, GPQA, MuSR, and MMLU-PRO into a single score, the 10B base posted an average of 27.59, with sub-scores of 36.48 on IFEval, 41.38 on BBH, 24.77 on MATH Level 5, 12.75 on GPQA, 14.17 on MuSR, and 36.00 on MMLU-PRO. TII claimed that Falcon3-7B-Instruct and Falcon3-10B-Instruct were the top-scoring base and instruct models under 13 billion parameters on the leaderboard at the time of release.
The Mamba variant showed substantial improvement over its predecessor. Compared to the original Falcon Mamba 7B from mid-2024, Falcon3-Mamba-7B-Base improved GSM8K from 51.3 to 65.9, MATH from 3.6 to 15.6, MMLU-PRO from 14.5 to 22.6, and GPQA from 8.1 to 10.6. The Mamba variant remains below the transformer 7B on most knowledge and reasoning benchmarks, which is consistent with broader findings that pure SSM models lag attention-based models on tasks that require in-context retrieval.
Falcon 3 is released under the TII Falcon-LLM License 2.0. The license is derived from Apache 2.0 and permits commercial use, modification, and redistribution of the model weights and outputs, with a few additional terms. Users must comply with an Acceptable Use Policy that prohibits illegal activity, generation of harmful content, weapons development, and certain other categories. The Acceptable Use Policy is hosted at a URL maintained by TII and may be updated over time, which means licensees are expected to comply with the current version rather than the version in effect at download time.
A notable carve-out applies to hosted commercial services. Parties that wish to offer shared instances of Falcon 3 as a managed inference or fine-tuning service are not automatically covered by the standard license and must enter into a separate agreement with TII. This is similar in spirit to clauses found in the Llama 3.1 community license, though the specific thresholds and definitions differ. Individual users and organizations running the model on their own infrastructure for internal or product use generally fall within the standard license.
The license has drawn some discussion in the open source community. Critics note that the additional restrictions on hosted services and the modifiable Acceptable Use Policy mean Falcon 3 does not meet the strict Open Source Initiative definition of open source software, despite TII marketing the family as open. Supporters point out that for the vast majority of practical use cases, including commercial deployment, the license is more permissive than the licenses used by some competing model families.
Falcon 3 was released into a crowded sub-13B open weight model market. TII positioned the 7B and 10B sizes against Llama 3.1 8B from Meta, Qwen 2.5 7B from Alibaba, Mistral 7B and Mistral NeMo from Mistral AI, Gemma 2 9B from Google, and Phi-3 from Microsoft. The smaller 1B and 3B variants were positioned against SmolLM2 from Hugging Face, Gemma 2 2B, Llama 3.2 1B, Qwen 2.5 1.5B, and Minitron 4B from NVIDIA.
| Model | Params | Context | Training tokens | License |
|---|---|---|---|---|
| Falcon3-7B-Base | 7B | 32K | 14T | TII Falcon-LLM 2.0 |
| Falcon3-10B-Base | 10B | 32K | 14T + 2T | TII Falcon-LLM 2.0 |
| Llama 3.1 8B | 8B | 128K | 15T | Llama 3.1 Community |
| Qwen 2.5 7B | 7B | 128K | 18T | Apache 2.0 |
| Mistral 7B v0.3 | 7B | 32K | not disclosed | Apache 2.0 |
| Gemma 2 9B | 9B | 8K | 8T | Gemma Terms |
| Falcon3-3B-Base | 3B | 32K | 14T (parent) + <100B | TII Falcon-LLM 2.0 |
| Llama 3.2 3B | 3B | 128K | 9T | Llama 3.2 Community |
| Qwen 2.5 3B | 3B | 32K | 18T | Qwen Research (non-commercial) |
On benchmark numbers reported by the respective developers, Falcon3-10B-Base is broadly competitive with Llama 3.1 8B and Qwen 2.5 7B on MMLU and MMLU-PRO, and ahead on several reasoning and math benchmarks at its weight class. Independent comparisons noted that the 10B model's MMLU score of 73.1 placed it among the top open models under 13B parameters at the time of release. The 7B base model lagged Qwen 2.5 7B on some math benchmarks but matched or exceeded it on knowledge tasks. The 3B model outperformed Llama 3.1 8B on several benchmarks despite having roughly one third the parameter count, a result consistent with TII's pruning and distillation strategy.
Falcon 3's context window of 32K tokens is shorter than the 128K windows shipped by Llama 3.1 and Qwen 2.5, which can matter for retrieval-augmented generation and long-document workloads. The vocabulary size of 131K is similar in scale to Llama 3.1 (128K) and Qwen 2.5 (151K). All three families ship Apache-2.0-style commercial-use licenses with some restrictions on hosted services or scale.
In May 2025, TII released Falcon-H1, a follow-up family that uses a hybrid attention plus state space architecture rather than a pure transformer or pure SSM. The Falcon-H1 release included six base models at 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B, each with an instruction-tuned variant. The 34B model is the largest Falcon model released since Falcon 180B, but with a very different architectural approach focused on efficiency and long-context capability rather than raw scale.
Falcon-H1 builds on the lessons of the Falcon 3 family, including the depth up-scaling and distillation pipelines used to produce the 10B and the small Falcon 3 variants. The hybrid architecture interleaves attention layers with Mamba 2 SSM blocks, which TII says combines the in-context retrieval strength of attention with the efficient long-context handling of SSMs. According to TII benchmarks, the 1.5B-Deep variant performs on par with leading 7B to 10B transformer models, and the 34B variant matches or exceeds Qwen3-32B, Llama 4 Scout 17B/109B, and Gemma 3 27B on several benchmarks.
Falcon-H1 effectively supersedes Falcon 3 as TII's flagship open model line, although the Falcon 3 weights remain available on Hugging Face and continue to be used in production deployments. The lineage from Falcon Mamba 7B through Falcon3-Mamba-7B-Base to Falcon-H1 represents a sustained TII investment in non-transformer architectures that few other major model providers have matched.
Industry coverage of Falcon 3 at launch focused on three themes: the small-model push from a previously large-model lab, the technical novelty of depth up-scaling and the 1.58-bit quantization, and the geopolitical implication of a competitive open LLM family released from the United Arab Emirates rather than the United States or China. Coverage in TechCrunch, VentureBeat, MarkTechPost, and Middle East AI News emphasized that Falcon 3 was deliberately sized for laptop and edge deployment.
Reviewers noted that the 10B model's benchmark numbers placed it at or near the top of the under-13B segment of the Hugging Face Open LLM Leaderboard v2, though some independent evaluations found that the gap to Qwen 2.5 7B and Llama 3.1 8B varied depending on the specific benchmark suite. The 1.58-bit quantized release attracted interest from researchers working on extreme low-precision inference, since most prior 1-bit and 1.58-bit work had been confined to smaller research models. Practitioners deploying Falcon 3 in production reported that the Llama-compatible architecture made integration straightforward across existing inference stacks.
The Mamba variant received attention from researchers working on state space models. The roughly 14-point GSM8K gain over the original Falcon Mamba 7B suggested that continued pretraining on math-heavy data can substantially close the gap between SSMs and transformers on reasoning tasks, although the Mamba variant still trailed the transformer Falcon3-7B on most knowledge benchmarks.
Criticism of Falcon 3 centered on the license. Some commentators argued that the TII Falcon-LLM License 2.0 should not be marketed as open source given the carve-out for hosted services and the modifiable Acceptable Use Policy. Others noted that the 32K context window was conservative compared to the 128K windows offered by competing families, which limited Falcon 3's suitability for long-document workloads without further fine-tuning. The four-language multilingual focus, while broader than English-only models, was narrower than Qwen 2.5's broader language coverage and limited Falcon 3's appeal for non-Western European language deployments. TII addressed the language coverage gap in subsequent releases, including a dedicated Arabic-language Falcon 3 variant added in 2025.