Falcon 3

AI Companies AI Models Large Language Models Open Source AI

19 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

15 citations

Revision

v3 · 3,708 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Falcon 3 is a family of open-weight large language models released on December 17, 2024 by the Technology Innovation Institute (TII), an applied research center based in Abu Dhabi, United Arab Emirates ^[1]^[3]. The family spans five base models from 1B to 10B parameters (Falcon3-1B, Falcon3-3B, Falcon3-7B, Falcon3-10B, plus a Falcon3-Mamba-7B state space variant), each paired with an instruction-tuned version, with the flagship Falcon3-7B-Base trained on 14 trillion tokens ^[1]^[4]. TII designed Falcon 3 to deliver strong performance while running on light infrastructure, including laptops, and reported that Falcon3-10B-Base topped the under-13B-parameter class on the Hugging Face Open LLM Leaderboard at release ^[2]^[3].

The release marked a significant shift in the Falcon series, which had previously emphasized very large pretrained models like the 180-billion-parameter Falcon 180B. With Falcon 3, TII moved toward the small language model segment that became commercially important during 2024, when efficient sub-13B models from Meta, Alibaba, Mistral, Google, and Microsoft demonstrated that careful training could outperform much larger systems on common tasks. According to TII, the Falcon 3 family included roughly 30 model checkpoints at launch, with base and instruct variants distributed in GGUF, GPTQ-Int4, GPTQ-Int8, AWQ, and 1.58-bit quantized formats for low-resource deployment ^[10].

Falcon 3 models are distributed under the TII Falcon-LLM License 2.0, a permissive license derived from Apache 2.0 with an acceptable use policy and additional terms for hosted commercial services ^[9]. The family was trained on up to 14 trillion tokens using 1,024 NVIDIA H100 GPUs and supports four languages in its main transformer variants: English, French, Spanish, and Portuguese ^[1]^[3]. A successor family, Falcon-H1, which uses a hybrid attention plus state space architecture, was released in May 2025 ^[12].

ELI5: what is Falcon 3 in plain terms?

Falcon 3 is a set of free, downloadable AI text models made by a research lab in Abu Dhabi. They come in small sizes (from about 1 billion to 10 billion "parameters," the internal dials a model learns), which is small enough that the bigger models can run on a good laptop instead of needing a data center. The lab trained the main model by showing it about 14 trillion words and snippets of code, then released it so anyone can use it, study it, or build products on top of it for free, with a few rules about reselling it as a hosted service.

What is Falcon 3?

Falcon 3 is the third major generation of TII's open Falcon line of large language models, positioned as a family of small but capable models rather than a single large one. The lineup at launch consisted of four transformer base models (1B, 3B, 7B, and 10B parameters) and one state space model (Falcon3-Mamba-7B), each with an instruction-tuned chat variant. TII positioned the family as competitive with larger systems such as Llama 3.1 8B and Qwen 2.5 7B on standard reasoning, math, and code benchmarks, while being small enough to run on consumer hardware ^[1]^[4].

TII summarized the goal in its announcement, stating that the release "sets new performance standards for small LLMs and democratizes access to advanced artificial intelligence by enabling the model to operate efficiently on light infrastructures, including laptops" ^[3]. Dr. Hakim Hacid, Chief Researcher of TII's AI and Digital Science Research Center, said: "Falcon 3 pushes the boundaries of small LLMs further, contributing to the open-source community by providing access to a better-performing AI" ^[3].

Who built Falcon 3, and why?

The Technology Innovation Institute is part of the Advanced Technology Research Council, a research entity established by the Abu Dhabi government in 2020. TII operates a dedicated AI and Digital Science Research Center that develops the Falcon series. The lab announced the first Falcon models in 2023, including Falcon 7B, Falcon 40B, and the 180-billion-parameter Falcon 180B, which was at the time one of the largest openly released LLMs. Those early models used a permissive license derived from Apache 2.0 with an acceptable use policy ^[1].

By late 2024, the open LLM landscape had shifted. Meta released the Llama 3 series, including 8B, 70B, and 405B parameter models, while Alibaba's Qwen 2.5 family covered checkpoints from 0.5B up to 72B. Smaller, well-trained models had begun to match or exceed the performance of larger predecessors on reasoning benchmarks, and quantization research showed that 7B to 13B models could run usefully on laptops, single GPUs, and even mobile hardware. TII's prior 180B model required server-grade infrastructure to run at acceptable speeds, which limited its real-world adoption. The Falcon 3 release explicitly targeted this gap, with TII describing the family as designed to run on "light infrastructures, including laptops" ^[3]. Dr. Najwa Aaraj, CEO of TII, framed the result as one that "exemplifies our pursuit of scientific excellence, offering enhanced efficiency and setting new benchmarks in AI technology" ^[3].

TII also released Falcon Mamba 7B in mid-2024, which was an attention-free state space model based on the original Mamba architecture. That model demonstrated that SSM-based LLMs could remain competitive with transformer designs at the 7B scale. The Falcon3-Mamba-7B-Base released in December 2024 is a continuation of that line, with additional pretraining on 1.5 trillion tokens layered on top of the original Falcon Mamba weights ^[6].

What sizes does Falcon 3 come in?

The initial release on December 17, 2024 included five base models, each with a corresponding instruction-tuned variant. The transformer-based models share a Llama-compatible architecture, which means they can be loaded by any inference engine that supports Llama-style decoder-only models, including llama.cpp, vLLM, and Ollama. The Mamba variant uses a different attention-free architecture and requires Mamba-aware runtimes ^[1]^[4].

Model	Parameters	Architecture	Layers	Context	Vocab	Training method
Falcon3-1B-Base	1B	Transformer (decoder)	18	8K	131K	Pruning + distillation from 7B
Falcon3-3B-Base	3B	Transformer (decoder)	22	32K	131K	Pruning + distillation from 7B
Falcon3-7B-Base	7B	Transformer (decoder)	28	32K	131K	Pretraining from scratch, 14T tokens
Falcon3-10B-Base	10B	Transformer (decoder)	40	32K	131K	Depth up-scaled from 7B + 2T tokens
Falcon3-Mamba-7B-Base	7B	Mamba SSM	64	32K	65K	Continued pretraining of Falcon Mamba 7B + 1.5T tokens

The smallest model, Falcon3-1B-Base, has an 8K token context window, while the 3B, 7B, 10B transformer models and the Mamba variant all support 32K tokens ^[2]. The transformer models share a vocabulary size of 131,072 tokens, while the Mamba model uses a smaller 65K vocabulary inherited from Falcon Mamba 7B ^[2]. All models were released in BF16 precision, with additional GGUF, GPTQ-Int4, GPTQ-Int8, AWQ, and a 1.58-bit quantization aimed at very low memory environments ^[1]^[10].

The 10B model is notable for how it was built. Rather than train a new 10B network from scratch, TII used a technique called depth up-scaling: starting from the trained Falcon3-7B-Base, duplicating a subset of its decoder layers to expand the network from 28 to 40 blocks, then continuing pretraining on an additional 2 trillion tokens of curated data ^[1]^[2]. This approach reuses the compute that went into the 7B model and avoids the cold start cost of a fresh training run.

The 1B and 3B models took the opposite path. TII produced them by pruning the 7B base model and applying knowledge distillation, using less than 100 billion tokens of curated training data on top of the pruned initialization ^[1]^[2]. The pruned-and-distilled approach gave the small models stronger starting representations than a from-scratch pretrain on the same token budget would have produced.

What architecture does Falcon 3 use?

The Falcon 3 transformer variants share a common decoder-only design with several specific choices that distinguish them from earlier Falcon models. Each transformer block uses grouped query attention (GQA) with 12 query heads and 4 key-value heads, giving a 3-to-1 query-to-KV ratio that reduces inference-time memory bandwidth without significantly hurting model quality ^[4]. The attention head dimension is fixed at 256 across all transformer sizes, a choice TII says was made to enable efficient FlashAttention-3 kernels ^[2].

The models use SwiGLU activation in the feed-forward layers, rotary position embeddings (RoPE) with a high theta value of 1,000,042 to extend usable context to 32K tokens, and a tied input and output embedding scheme for the smaller variants ^[2]^[4]. The 131K-token tokenizer was retrained from scratch on the Falcon 3 pretraining corpus and is shared across the transformer base models. The architecture is intentionally compatible with Llama-style implementations, which means existing Hugging Face Transformers code paths, vLLM, llama.cpp, and Ollama can serve Falcon 3 transformer checkpoints with no model-specific changes.

The Mamba variant uses a different design. Falcon3-Mamba-7B-Base is a 64-layer state space model with a hidden width of 4,096 and a state dimension of 16 ^[6]. It uses the original Mamba1 selective scan mechanism rather than the Mamba 2 update that appeared later in 2024. Because the SSM mechanism has constant memory and constant-time generation per token, the Mamba variant is particularly well-suited for long-context inference where transformer KV cache memory would dominate.

How was Falcon 3 trained?

The Falcon3-7B-Base, which serves as the foundation for the 10B, 3B, and 1B variants, was trained on 14 trillion tokens. According to TII, this was more than double the 5.5 trillion tokens used for the previous generation of Falcon models ^[1]^[3]. The 14T mix combined web data, code, STEM content, curated high-quality sources, and multilingual data covering English, French, Spanish, and Portuguese ^[1].

Training ran on 1,024 NVIDIA H100 GPUs, which TII operates at its compute facility in Abu Dhabi ^[1]. The 14T-token base run for the 7B model represented the bulk of the compute spent on the family. The 10B variant required an additional 2 trillion tokens of continued pretraining on the depth-upscaled network, while the Mamba variant added 1.5 trillion tokens on top of the existing Falcon Mamba 7B checkpoint ^[1]^[6]. The 1B and 3B distilled variants needed less than 100 billion tokens each because they inherited representations from the 7B teacher ^[1].

TII has not publicly disclosed the full data composition or the proportion of synthetic data in the corpus. The instruction-tuned variants were aligned using a combination of supervised fine-tuning and direct preference optimization on publicly available and custom-curated instruction datasets, with TII publishing chat templates and recommended system prompts on each model card ^[4].

How does Falcon 3 perform?

TII reported benchmark numbers across the standard small LLM evaluation suite, including MMLU and MMLU-PRO for general knowledge, GSM8K and MATH for grade school and competition math, BIG-Bench Hard (BBH) for chain-of-thought reasoning, ARC Challenge and GPQA for science questions, HumanEval and MBPP for code, and IFEval for instruction following. The base model results below are taken from the official Hugging Face model cards ^[4]^[5]^[6]^[7]^[8].

Benchmark	Falcon3-1B-Base	Falcon3-3B-Base	Falcon3-7B-Base	Falcon3-10B-Base	Falcon3-Mamba-7B-Base
MMLU (5-shot)	n/a	n/a	67.5	73.1	n/a
MMLU-PRO (5-shot)	n/a	29.7	39.2	42.5	22.6
IFEval	54.4	n/a	34.3	36.4	n/a
GSM8K (5-shot)	n/a	n/a	76.2	81.4	65.9
MATH Lvl-5 (4-shot)	n/a	19.9	18.0	22.9	15.6
ARC Challenge (25-shot)	n/a	n/a	59.6	66.8	n/a
GPQA (0-shot)	n/a	n/a	35.5	34.1	10.6
MUSR (0-shot)	40.7	n/a	47.3	44.2	n/a
BBH (3-shot)	n/a	n/a	51.0	59.7	n/a
PIQA (0-shot)	n/a	n/a	77.7	79.4	n/a
SciQ (0-shot)	86.8	n/a	95.3	93.5	n/a
Winogrande (0-shot)	n/a	n/a	71.0	73.6	n/a
OpenbookQA (0-shot)	n/a	n/a	31.4	45.0	n/a

The instruction-tuned versions improve on the base model numbers in instruction-following-heavy tasks. Falcon3-10B-Instruct reaches 78 on IFEval and 83.1 on GSM8K, the latter a roughly 2-point gain over the 10B base ^[3]^[5]. On the Hugging Face Open LLM Leaderboard v2, which weights IFEval, BBH, MATH Level 5, GPQA, MuSR, and MMLU-PRO into a single score, the 10B base posted an average of 27.59, with sub-scores of 36.48 on IFEval, 41.38 on BBH, 24.77 on MATH Level 5, 12.75 on GPQA, 14.17 on MuSR, and 36.00 on MMLU-PRO ^[5]^[13]. TII stated that "Falcon3-10B-Base stands as the state-of-the-art achieving strong results in the under-13B category," and that "Falcon3-7B-Instruct and Falcon3-10B-Instruct outperform all instruct models under the 13B scale on the open leaderboard" at the time of release ^[2].

The Mamba variant showed substantial improvement over its predecessor. Compared to the original Falcon Mamba 7B from mid-2024, Falcon3-Mamba-7B-Base improved GSM8K from 51.3 to 65.9, MATH from 3.6 to 15.6, MMLU-PRO from 14.5 to 22.6, and GPQA from 8.1 to 10.6 ^[6]. The Mamba variant remains below the transformer 7B on most knowledge and reasoning benchmarks, which is consistent with broader findings that pure SSM models lag attention-based models on tasks that require in-context retrieval.

Is Falcon 3 open source, and what is its license?

Falcon 3 is released under the TII Falcon-LLM License 2.0. The license is derived from Apache 2.0 and permits commercial use, modification, and redistribution of the model weights and outputs, with a few additional terms ^[9]. Users must comply with an Acceptable Use Policy that prohibits illegal activity, generation of harmful content, weapons development, and certain other categories. The Acceptable Use Policy is hosted at a URL maintained by TII and may be updated over time, which means licensees are expected to comply with the current version rather than the version in effect at download time ^[9].

A notable carve-out applies to hosted commercial services. Parties that wish to offer shared instances of Falcon 3 as a managed inference or fine-tuning service are not automatically covered by the standard license and must enter into a separate agreement with TII. This is similar in spirit to clauses found in the Llama 3.1 community license, though the specific thresholds and definitions differ. Individual users and organizations running the model on their own infrastructure for internal or product use generally fall within the standard license ^[9].

The license has drawn some discussion in the open source community. Critics note that the additional restrictions on hosted services and the modifiable Acceptable Use Policy mean Falcon 3 does not meet the strict Open Source Initiative definition of open source software, despite TII marketing the family as open. Supporters point out that for the vast majority of practical use cases, including commercial deployment, the license is more permissive than the licenses used by some competing model families.

How does Falcon 3 compare to Llama, Qwen, and Gemma?

Falcon 3 was released into a crowded sub-13B open weight model market. TII positioned the 7B and 10B sizes against Llama 3.1 8B from Meta, Qwen 2.5 7B from Alibaba, Mistral 7B and Mistral NeMo from Mistral AI, Gemma 2 9B from Google, and Phi-3 from Microsoft. The smaller 1B and 3B variants were positioned against SmolLM2 from Hugging Face, Gemma 2 2B, Llama 3.2 1B, Qwen 2.5 1.5B, and Minitron 4B from NVIDIA.

Model	Params	Context	Training tokens	License
Falcon3-7B-Base	7B	32K	14T	TII Falcon-LLM 2.0
Falcon3-10B-Base	10B	32K	14T + 2T	TII Falcon-LLM 2.0
Llama 3.1 8B	8B	128K	15T	Llama 3.1 Community
Qwen 2.5 7B	7B	128K	18T	Apache 2.0
Mistral 7B v0.3	7B	32K	not disclosed	Apache 2.0
Gemma 2 9B	9B	8K	8T	Gemma Terms
Falcon3-3B-Base	3B	32K	14T (parent) + <100B	TII Falcon-LLM 2.0
Llama 3.2 3B	3B	128K	9T	Llama 3.2 Community
Qwen 2.5 3B	3B	32K	18T	Qwen Research (non-commercial)

On benchmark numbers reported by the respective developers, Falcon3-10B-Base is broadly competitive with Llama 3.1 8B and Qwen 2.5 7B on MMLU and MMLU-PRO, and ahead on several reasoning and math benchmarks at its weight class. Independent comparisons noted that the 10B model's MMLU score of 73.1 placed it among the top open models under 13B parameters at the time of release ^[11]^[15]. The 7B base model lagged Qwen 2.5 7B on some math benchmarks but matched or exceeded it on knowledge tasks. The 3B model outperformed Llama 3.1 8B on several benchmarks despite having roughly one third the parameter count, a result consistent with TII's pruning and distillation strategy.

Falcon 3's context window of 32K tokens is shorter than the 128K windows shipped by Llama 3.1 and Qwen 2.5, which can matter for retrieval-augmented generation and long-document workloads. The vocabulary size of 131K is similar in scale to Llama 3.1 (128K) and Qwen 2.5 (151K). All three families ship Apache-2.0-style commercial-use licenses with some restrictions on hosted services or scale.

How does Falcon 3 relate to Falcon-Mamba and Falcon-H1?

In May 2025, TII released Falcon-H1, a follow-up family that uses a hybrid attention plus state space architecture rather than a pure transformer or pure SSM. The Falcon-H1 release included six base models at 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B, each with an instruction-tuned variant ^[12]. The 34B model is the largest Falcon model released since Falcon 180B, but with a very different architectural approach focused on efficiency and long-context capability rather than raw scale.

Falcon-H1 builds on the lessons of the Falcon 3 family, including the depth up-scaling and distillation pipelines used to produce the 10B and the small Falcon 3 variants. The hybrid architecture interleaves attention layers with Mamba 2 SSM blocks, which TII says combines the in-context retrieval strength of attention with the efficient long-context handling of SSMs. According to TII benchmarks, the 1.5B-Deep variant performs on par with leading 7B to 10B transformer models, and the 34B variant matches or exceeds Qwen3-32B, Llama 4 Scout 17B/109B, and Gemma 3 27B on several benchmarks ^[12].

Falcon-H1 effectively supersedes Falcon 3 as TII's flagship open model line, although the Falcon 3 weights remain available on Hugging Face and continue to be used in production deployments. The lineage from Falcon Mamba 7B through Falcon3-Mamba-7B-Base to Falcon-H1 represents a sustained TII investment in non-transformer architectures that few other major model providers have matched.

How was Falcon 3 received?

Industry coverage of Falcon 3 at launch focused on three themes: the small-model push from a previously large-model lab, the technical novelty of depth up-scaling and the 1.58-bit quantization, and the geopolitical implication of a competitive open LLM family released from the United Arab Emirates rather than the United States or China. Coverage in TechCrunch, VentureBeat, MarkTechPost, and Middle East AI News emphasized that Falcon 3 was deliberately sized for laptop and edge deployment ^[10]^[11].

Reviewers noted that the 10B model's benchmark numbers placed it at or near the top of the under-13B segment of the Hugging Face Open LLM Leaderboard v2, though some independent evaluations found that the gap to Qwen 2.5 7B and Llama 3.1 8B varied depending on the specific benchmark suite ^[13]^[15]. The 1.58-bit quantized release attracted interest from researchers working on extreme low-precision inference, since most prior 1-bit and 1.58-bit work had been confined to smaller research models. Practitioners deploying Falcon 3 in production reported that the Llama-compatible architecture made integration straightforward across existing inference stacks.

The Mamba variant received attention from researchers working on state space models. The roughly 14-point GSM8K gain over the original Falcon Mamba 7B suggested that continued pretraining on math-heavy data can substantially close the gap between SSMs and transformers on reasoning tasks, although the Mamba variant still trailed the transformer Falcon3-7B on most knowledge benchmarks ^[6].

Criticism of Falcon 3 centered on the license. Some commentators argued that the TII Falcon-LLM License 2.0 should not be marketed as open source given the carve-out for hosted services and the modifiable Acceptable Use Policy. Others noted that the 32K context window was conservative compared to the 128K windows offered by competing families, which limited Falcon 3's suitability for long-document workloads without further fine-tuning. The four-language multilingual focus, while broader than English-only models, was narrower than Qwen 2.5's broader language coverage and limited Falcon 3's appeal for non-Western European language deployments. TII addressed the language coverage gap in subsequent releases, including a dedicated Arabic-language Falcon 3 variant added in 2025 ^[14].

References

Falcon-LLM Team. "Welcome to the Falcon 3 Family of Open Models!" Hugging Face Blog, December 17, 2024. https://huggingface.co/blog/falcon3 ↩
Falcon-LLM Team. "Welcome to the Falcon 3 Family of Open Models!" Falcon LLM Blog, December 17, 2024. https://falcon-lm.github.io/blog/falcon-3/ ↩
Technology Innovation Institute. "Falcon 3: UAE's Technology Innovation Institute Launches World's most Powerful Small AI Models that can also be run on Light Infrastructures, including Laptops." Press release, December 17, 2024. https://www.tii.ae/news/falcon-3-uaes-technology-innovation-institute-launches-worlds-most-powerful-small-ai-models ↩
"tiiuae/Falcon3-7B-Base." Hugging Face model card. https://huggingface.co/tiiuae/Falcon3-7B-Base ↩
"tiiuae/Falcon3-10B-Base." Hugging Face model card. https://huggingface.co/tiiuae/Falcon3-10B-Base ↩
"tiiuae/Falcon3-Mamba-7B-Base." Hugging Face model card. https://huggingface.co/tiiuae/Falcon3-Mamba-7B-Base ↩
"tiiuae/Falcon3-1B-Base." Hugging Face model card. https://huggingface.co/tiiuae/Falcon3-1B-Base ↩
"tiiuae/Falcon3-3B-Base." Hugging Face model card. https://huggingface.co/tiiuae/Falcon3-3B-Base ↩
"TII Falcon-LLM License 2.0." Technology Innovation Institute. https://falconllm.tii.ae/falcon-terms-and-conditions.html ↩
MarkTechPost. "Technology Innovation Institute TII-UAE Just Released Falcon 3: A Family of Open-Source AI Models with 30 New Model Checkpoints from 1B to 10B." December 17, 2024. https://www.marktechpost.com/2024/12/17/technology-innovation-institute-tii-uae-just-released-falcon-3-a-family-of-open-source-ai-models-with-30-new-model-checkpoints-from-1b-to-10b/ ↩
Middle East AI News. "TII launches world's most powerful SLMs under 13B parameters." December 2024. https://www.middleeastainews.com/p/tii-launches-most-powerful-slm ↩
Falcon-LLM Team. "Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance." Falcon LLM Blog, May 2025. https://falcon-lm.github.io/blog/falcon-h1/ ↩
"Hugging Face Open LLM Leaderboard v2." Hugging Face. https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard ↩
Middle East AI News. "Falcon 3 LLM series gets first Arabic model." 2025. https://www.middleeastainews.com/p/falcon-3-arabic-ai-model-released ↩
Maginative. "UAE's TII Launches Falcon 3: High-Performance Small AI Models." December 2024. https://www.maginative.com/article/uaes-tii-launches-falcon-3-high-performance-small-ai-models/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Axolotl BitNet b1.58 Falcon-H1 SmolLM

ELI5: what is Falcon 3 in plain terms?

What is Falcon 3?

Who built Falcon 3, and why?

What sizes does Falcon 3 come in?

What architecture does Falcon 3 use?

How was Falcon 3 trained?

How does Falcon 3 perform?

Is Falcon 3 open source, and what is its license?

How does Falcon 3 compare to Llama, Qwen, and Gemma?

How does Falcon 3 relate to Falcon-Mamba and Falcon-H1?

How was Falcon 3 received?

See also

References

Improve this article

Related Articles

Codestral

Pixtral

Mixtral 8x22B

Llama 3

OLMo

DeepSeek V4

What links here

Related Articles

Codestral

Pixtral

Mixtral 8x22B

Llama 3

OLMo

DeepSeek V4

What links here