Falcon (language model)

Large Language Models Natural Language Processing Open Source AI

30 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

23 citations

Revision

v7 · 5,929 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Falcon is a family of open-source large language models built by the Technology Innovation Institute (TII), the applied-research pillar of the Abu Dhabi government's Advanced Technology Research Council (ATRC) in the United Arab Emirates. First released in May 2023, Falcon 40B reached the number one spot on the Hugging Face Open LLM Leaderboard within a week of launch, making it the best-performing openly available pretrained model at the time and giving the UAE a seat at the table among the world's leading AI nations ^[1]^[8]. The family now spans the original Falcon 7B and 40B, the 180-billion-parameter Falcon 180B, Falcon 2 (including a vision-language model), Falcon Mamba, Falcon 3, the hybrid Mamba-Transformer Falcon-H1, dedicated Arabic models, and the Falcon H1R reasoning model, establishing TII as one of the most significant non-Western contributors to open-source AI development ^[1].

Falcon models are notable for their reliance on the RefinedWeb dataset, a massive web-only corpus that demonstrated high-quality language models could be trained without expensive curated datasets. The models have been released under permissive licenses, primarily the TII Falcon License (based on Apache 2.0), making them freely available for both research and commercial use. Over the course of three years, TII has steadily expanded the Falcon family to include multimodal models, state-space architectures, hybrid Mamba-Transformer designs, specialized Arabic language models, and ultra-compact reasoning models.

When was Falcon released?

Falcon first appeared in May 2023 with the 7B and 40B models and has been released in successive generations since. The original Falcon 7B and 40B launched in May 2023, Falcon 180B on September 6, 2023, Falcon 2 in May 2024, Falcon Mamba in August 2024, Falcon 3 on December 17, 2024, Falcon-H1 and Falcon Arabic on May 21, 2025, and the Falcon H1R reasoning model, Falcon-H1-Arabic, and Falcon-H1-Tiny in January 2026 ^[1]^[9]^[15]^[16]^[18]. A full release table appears in the Complete Model Timeline section below.

Is Falcon open source?

Yes. Falcon is released under permissive open licenses, and the original Falcon 7B and 40B were notably published under the fully permissive Apache 2.0 license in May 2023, at a time when Meta's competing LLaMA models carried more restrictive terms ^[1]^[8]. Later models use the TII Falcon License (based on Apache 2.0) or the TII Falcon License 2.0, which adds an acceptable use policy. The one partial exception was Falcon 180B, released under a more restrictive TII Falcon 180B license with some commercial limitations, after which TII returned to permissive terms ^[9]^[15]. The Licensing History section details the license used for each generation.

Technology Innovation Institute

The Technology Innovation Institute was founded in May 2020 as part of Abu Dhabi's broader strategy to establish the UAE as a global hub for advanced technology research. TII operates under the Advanced Technology Research Council (ATRC), a government body that oversees the emirate's research and development ecosystem ^[2].

TII comprises dedicated research centers, each focused on a priority technology domain: Advanced Materials, AI and Digital Science, Autonomous Robotics, Cryptography, Directed Energy, Propulsion and Space, Quantum, Renewable and Sustainable Energy, Secure Systems, and Biotechnology. The AI and Digital Science Research Center is responsible for the Falcon project ^[2].

Funding for TII comes directly from the Abu Dhabi government, which views AI as a strategic priority for the UAE's economic diversification and technological sovereignty. While specific budget figures have not been publicly disclosed, TII has committed to multimillion-dollar research partnerships, including a three-year collaboration with the California Institute of Technology covering autonomous systems, quantum algorithms, AI, biotechnology, and propulsion systems ^[2]. The institute has recruited researchers and scientists from around the world, building a globally competitive team.

The Falcon project represents TII's most visible and widely adopted contribution to the global AI landscape, placing a Middle Eastern government research lab in direct competition with major Western tech companies and academic institutions. Its success has inspired similar national AI initiatives in Saudi Arabia, South Korea, and other countries seeking technological sovereignty in artificial intelligence.

In 2023, TII spun off AI71, a commercial entity dedicated to deploying Falcon models in enterprise settings. AI71 offers API access to Falcon models and works with businesses across healthcare, finance, education, and government sectors to integrate the technology into their operations. In partnership with Amazon Web Services (AWS), AI71 has made Falcon models available through Amazon Bedrock Marketplace and Amazon SageMaker, enabling enterprises to integrate Falcon into their applications through pay-as-you-go APIs ^[3].

RefinedWeb Dataset

A central innovation behind the Falcon models is the RefinedWeb dataset, described in a paper presented at NeurIPS 2023 ^[4]. RefinedWeb is a massive English-language pretraining corpus containing approximately five trillion tokens, extracted entirely from CommonCrawl web data. The dataset covers roughly 10 billion web documents and was built using a pipeline called MacroData Refinement (MDR).

The MDR pipeline applies extensive filtering and deduplication to raw web crawl data. The process begins by filtering URLs to remove adult content using a blocklist and scoring system, then extracting content from web pages using the trafilatura library, and performing language identification with fastText. After initial extraction, heuristics from MassiveWeb are applied along with line-wise corrections to further clean the data. The pipeline combines both exact and fuzzy deduplication techniques at very large scale, ultimately removing nearly 90% of the original web content to produce a clean, high-quality training corpus ^[4].

The key finding of the RefinedWeb paper was that properly filtered and deduplicated web data alone could produce models that outperform those trained on popular curated datasets like The Pile. As the authors put it, "properly filtered and deduplicated web data alone can lead to powerful models; even significantly outperforming models from the state-of-the-art trained on The Pile" ^[4]. This was a significant result for the field. Prior to RefinedWeb, the conventional wisdom held that training the best language models required carefully curated multi-source datasets. TII demonstrated that the scale and diversity of web data, when properly cleaned, could match or exceed the quality of curated alternatives. The development process was iterative: the team measured zero-shot performance of models trained on development versions of the dataset, with the main goal of maximizing performance while manually auditing samples to identify filtering improvements ^[4].

A 600-billion-token extract of RefinedWeb was publicly released to the research community on Hugging Face, along with smaller models (1.3B and 7.5B parameters) trained on it ^[4].

Dataset	Tokens	Sources	Key Feature
RefinedWeb	~5 trillion	CommonCrawl (web only)	MDR pipeline with extensive deduplication
The Pile	~825 billion	22 curated sources	Academic papers, books, code, web
ROOTS (BLOOM)	~350 billion unique	498 sources, 59 languages	Multilingual, curated
RedPajama	~1.2 trillion	7 source categories	Open reproduction of LLaMA training data

Architecture and Design

The Falcon architecture is based on the decoder-only transformer design that has become standard for autoregressive language models. Key architectural features across the Falcon family include several design choices that were becoming standard for high-performance language models in 2023.

Attention Mechanism. The original Falcon models (7B and 40B) use multi-query attention (MQA), which reduces the memory bandwidth requirements during inference by sharing key and value heads across multiple query heads. This is particularly important for serving large models efficiently. Falcon 2 11B adopted grouped-query attention (GQA) with eight key-value heads, offering a middle ground between full multi-head attention and the more aggressive sharing in MQA ^[5]^[6].

Positional Encoding. The models use rotary position embeddings (RoPE), which encode positional information through rotation matrices applied to the query and key vectors. RoPE allows for better extrapolation to longer sequence lengths than were seen during training.

Activation Function. Falcon 3 models adopted the SwiGLU activation function, replacing the standard GELU used in earlier Falcon models. SwiGLU has been shown to improve training efficiency and model quality in a number of modern LLM architectures.

Head Dimension. Falcon 3 models use a head dimension of 256, which was specifically chosen to be optimized for FlashAttention-3, improving training and inference throughput ^[7].

Vocabulary. Falcon 3 transformer models use a vocabulary of approximately 131,000 tokens, significantly larger than the vocabulary used in Falcon 1, which allows better handling of multilingual text and code ^[7].

FlashAttention. The training process has incorporated FlashAttention from the beginning, an IO-aware exact attention algorithm that speeds up transformer training by reducing memory reads and writes between GPU high-bandwidth memory and on-chip SRAM.

Training Infrastructure. Falcon models were trained on large GPU clusters. Falcon 180B, for example, required approximately 4,000 GPU-hours of compute on A100 GPUs, making it one of the most resource-intensive open model training runs at the time of its release. Falcon 2 11B was trained largely on 1,024 A100 40GB GPUs using a 3D parallelism strategy combined with ZeRO and Flash-Attention 2, while Falcon 3 7B was trained on 1,024 H100 GPU chips ^[6]^[7].

Llama Compatibility. Starting with Falcon 3, TII made the models compatible with the Llama architecture format, allowing users to leverage the extensive tooling and ecosystem built around Meta's models without modification ^[7].

Falcon 7B and 40B (May 2023)

TII released the first Falcon models in May 2023: Falcon 7B and Falcon 40B. Both were decoder-only transformer models trained on subsets of the RefinedWeb dataset. Falcon 40B was trained on one trillion tokens, while Falcon 7B was trained on 1.5 trillion tokens ^[1]^[5].

Upon release, Falcon 40B achieved the top position on the Hugging Face Open LLM Leaderboard within the week of its launch, making it the best-performing openly available pretrained model at that time. Falcon 7B also led its weight class, edging out MosaicML's MPT-7B as the best pretrained model at the 7-billion-parameter scale ^[8]. Both models were released under the Apache 2.0 license, which was a notable move since Meta's competing LLaMA models at the time carried more restrictive license terms.

Instruction-tuned variants, Falcon 7B Instruct and Falcon 40B Instruct, were also released. These fine-tuned versions were designed for conversational and instruction-following tasks.

Model	Parameters	Training Tokens	Attention	License	Release
Falcon 7B	7 billion	1.5 trillion	Multi-query	Apache 2.0	May 2023
Falcon 40B	40 billion	1 trillion	Multi-query	Apache 2.0	May 2023
Falcon 7B Instruct	7 billion	1.5T + fine-tuning	Multi-query	Apache 2.0	May 2023
Falcon 40B Instruct	40 billion	1T + fine-tuning	Multi-query	Apache 2.0	May 2023

Falcon 180B (September 2023)

On September 6, 2023, TII released Falcon 180B, a dramatically scaled-up version with 180 billion parameters trained on 3.5 trillion tokens from the RefinedWeb dataset. At the time of its release, Falcon 180B was the largest openly available language model in the world ^[9].

The model achieved a score of 68.74 on the Hugging Face Open LLM Leaderboard at launch (revised to 67.85 after the leaderboard added new benchmarks in November 2023), making it the highest-scoring openly released pretrained model at that time. According to Hugging Face, "Falcon 180B typically sits somewhere between GPT 3.5 and GPT4 depending on the evaluation benchmark," and the model is "considered on par with PaLM-2 Large, making Falcon 180B one of the most capable LLMs publicly known" ^[9]. It consistently matched or surpassed Google's PaLM 2 Large on widely used benchmarks including HellaSwag, LAMBADA, WebQuestions, and Winogrande ^[9]^[10].

Falcon 180B was 2.5 times larger than Meta's LLaMA 2 70B and was trained with approximately four times more compute. On the MMLU benchmark, it outperformed both LLaMA 2 70B and GPT-3.5, although it fell short of GPT-4 ^[9]^[10].

The model was released under TII's Falcon 180B license, which was more restrictive than the Apache 2.0 license used for the smaller models. While still permitting research and commercial use, the license included some limitations that prevented it from being classified as fully open-source by some definitions. This distinction drew attention in the open-source AI community, where license terms are closely scrutinized ^[9].

Benchmark	Falcon 180B	LLaMA 2 70B	GPT-3.5	GPT-4
MMLU (5-shot)	70.4	68.9	70.0	86.4
HellaSwag (10-shot)	88.0	87.3	85.5	95.3
ARC (25-shot)	69.8	67.3	85.2	96.3
HF Open LLM Score	68.74	~67	N/A	N/A

Falcon 2 (May 2024)

In May 2024, TII released Falcon 2, representing the second generation of the model family. This release focused on a smaller, more efficient model size while adding multimodal capabilities ^[11].

Falcon 2 11B was trained on 5.5 trillion tokens, a significant increase over the original Falcon models. The model uses grouped-query attention (GQA) with eight key-value heads. Despite having only 11 billion parameters, the model outperformed Meta's LLaMA 3 8B and performed on par with Google's Gemma 7B on the Hugging Face Open LLM Leaderboard, trailing the first-place Gemma 7B by just 0.01 average points (64.28 versus 64.29) as independently verified by Hugging Face ^[11].

The more notable release was Falcon 2 11B VLM (Vision Language Model), TII's first multimodal model. The VLM integrates the pretrained CLIP ViT-L/14 vision encoder with the Falcon 2 11B chat-fine-tuned model. Training was carried out in two stages: a pretraining stage where the LLM was frozen and only the multimodal projector was trained on 558,000 image-caption pairs, followed by a fine-tuning stage where both the projector and LLM weights were trained on 1.2 million image-text instruction data from public datasets. The VLM employs a dynamic encoding mechanism at high resolution for image inputs, similar to the approach used in LLaVA-Next. Training was performed on 16 A100 80GB GPUs with ZeRO and Flash-Attention 2 ^[12].

Falcon 2 was trained on data spanning 11 languages (English, German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, and Swedish), expanding the model's multilingual capabilities beyond the primarily English focus of the original Falcon models. Both variants were released under the TII Falcon License ^[11].

Falcon Mamba (August 2024)

In August 2024, TII released Falcon Mamba 7B, a model based on the state space model (SSM) architecture rather than the traditional transformer. It was the first open-source pure state-space language model (SSLM) at this scale, and Hugging Face independently verified it as the top-performing open-source SSLM globally at the time of release ^[13].

The model uses the original Mamba architecture with additional RMS normalization layers added for stable large-scale training. Unlike transformer models, which use attention mechanisms that scale quadratically with sequence length, Falcon Mamba processes sequences with constant-time token generation regardless of context size. This means memory requirements remain constant (only the recurrent state is stored), rather than growing linearly with the key-value cache as transformers do ^[13]^[14].

Falcon Mamba 7B was trained on approximately 5,500 billion tokens, using a mix of RefinedWeb data, high-quality technical data, code data from public sources, and curated data in the final decay stage. On the original Hugging Face Open LLM Leaderboard (v1), the model achieved an average score of 64.09, which was competitive with Falcon 2 11B (64.28) despite having fewer parameters. Specific scores included 80.82 on HellaSwag, 62.11 on MMLU, and 62.03 on ARC ^[14].

A key practical advantage of Falcon Mamba is that it can process sequences of arbitrary length on a single NVIDIA A10 GPU with 24GB of memory, while transformer models of similar size would be constrained by the growing KV cache. On an H100 GPU, the model maintains constant throughput and steady CUDA memory usage even at context lengths of 130,000 tokens, whereas transformer models degrade in speed and consume increasing memory ^[14].

Aspect	Falcon Mamba (SSM)	Transformer Models
Attention	None (selective state spaces)	Full attention mechanism
Token Generation	Constant time regardless of context	Scales with sequence length
Memory	Constant (recurrent state only)	Linear scaling (KV caches)
Arbitrary Length	Supported on single A10 24GB GPU	Limited by KV cache size

Falcon Mamba 7B was released under the TII Falcon Mamba 7B License 1.0, with both base and instruct-tuned variants available on Hugging Face ^[13].

Falcon 3 (December 2024)

On December 17, 2024, TII released Falcon 3, the third generation of the model family. Falcon 3 shifted focus toward small language models (SLMs) designed to run efficiently on consumer hardware, including laptops and single GPUs ^[15].

The Falcon 3 series includes five models: Falcon3-1B, Falcon3-3B, Falcon3-Mamba-7B (an updated state-space variant), Falcon3-7B, and Falcon3-10B. All transformer models were trained on approximately 14 trillion tokens, more than double the 5.5 trillion tokens used for Falcon 2. The training data comprised a mix of web content, code, STEM material, curated high-quality corpora, and multilingual data. Models are available in both Base and Instruct variants, totaling over 30 model checkpoints ^[7]^[15].

The different model sizes were produced using distinct training strategies. Falcon3-7B-Base was the primary model, trained from scratch on 1,024 H100 GPU chips. Falcon3-10B-Base was created through depth up-scaling from the 7B model: layers were duplicated, and the model received an additional 2 trillion tokens of continued pre-training on high-quality data. Falcon3-1B and Falcon3-3B were derived from the larger models through pruning and knowledge distillation, trained on less than 100 billion tokens of curated high-quality data. Falcon3-Mamba-7B-Base received an additional 1.5 trillion tokens of high-quality training data on top of the original Falcon Mamba 7B, enhancing its reasoning and mathematical capabilities ^[7].

Upon release, the Falcon3-10B model achieved the number one position on the Hugging Face Open LLM Leaderboard for models under 13 billion parameters. The 3B model notably outperformed larger models like LLaMA 3.1 8B and NVIDIA Minitron 4B on several benchmarks. The 1B model surpassed SmolLM2-1.7B and matched Gemma-2-2B despite being smaller. Falcon 3 supports English, French, Spanish, and Portuguese, with a context length of up to 32,000 tokens (8,000 for the 1B model) ^[7]^[15].

Model	Parameters	Training Tokens	Method	Context Length
Falcon3-1B	1 billion	<100B (distilled)	Pruning + knowledge distillation	8K
Falcon3-3B	3 billion	<100B (distilled)	Pruning + knowledge distillation	32K
Falcon3-Mamba-7B	7 billion	~7T total	Continued training of Falcon Mamba	32K
Falcon3-7B	7 billion	14 trillion	Trained from scratch	32K
Falcon3-10B	10 billion	14T + 2T (up-scaling)	Depth up-scaling from 7B	32K

Falcon 3 Benchmark Results

Benchmark	Falcon3-1B-Instruct	Falcon3-3B-Instruct	Falcon3-7B-Base	Falcon3-10B-Base
MMLU	-	-	67.4	73.1
MMLU-PRO	-	29.7	39.2	42.5
ARC Challenge	-	-	65.9	-
GSM8K	-	-	79.1	83.1
BBH	-	-	51.0	59.7
IFEval	54.4	-	-	78.0
MATH Lvl5	-	19.9	-	22.9
MBPP	-	-	-	73.8

All Falcon 3 models were released under the TII Falcon License 2.0, a permissive Apache 2.0-based license with an acceptable use policy that encourages responsible AI development ^[15].

Falcon H1 (May 2025)

On May 21, 2025, TII released Falcon-H1, a new family of hybrid language models that combine traditional transformer attention mechanisms with Mamba-2 state-space model components operating in parallel within each block. The "H" in the name stands for "hybrid." TII describes the series as "a collection of six open-source models ranging from 0.5B to 34B parameters" that combine "attention and Mamba-2 heads in parallel within our hybrid mixer block" ^[16]. This architectural approach seeks to combine the strong contextual understanding of attention with the linear-time efficiency of SSMs, particularly for long sequences ^[16].

Falcon-H1 covers six model sizes: 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B, each available in base and instruct-tuned variants. The models support a context window of up to 256,000 tokens and natively handle 18 languages (Arabic, Czech, German, English, Spanish, French, Hindi, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Romanian, Russian, Swedish, Urdu, and Chinese), with scalability to over 100 languages ^[16].

The architectural design places attention and SSM heads in parallel within each block, and their outputs are concatenated before being passed through the block's output projection. The ratio of attention to SSM heads is adjustable, and the models use extremely large RoPE scaling in positional embeddings, which is a technique unique to hybrid models. TII developed a customized Maximal Update Parametrization (muP) with 35 parameter groups and fine-tuned multipliers, enabling hyperparameter transfer across model sizes and allowing all six models to be trained in parallel ^[16].

A key training innovation was "reverse curriculum learning," where complex data is shown from the start of training rather than being introduced gradually. The team also employed smart data reuse, where high-quality samples are reused more frequently without harming generalization, guided by memorization window estimation ^[16].

Performance results were strong across the size range. Falcon-H1-0.5B achieved results comparable to typical 7B models from 2024. Falcon-H1-1.5B-Deep rivaled 7B to 10B transformer models. At the top end, Falcon-H1-34B-Instruct was competitive with models two to four times larger, including Qwen 2.5-72B and LLaMA 3.3-70B. For inference efficiency, Falcon-H1-34B achieved up to four times faster input throughput and eight times faster output throughput compared to Qwen 2.5-32B on longer contexts, thanks to the SSM component's linear scaling ^[16].

Model	Parameters	Context	Comparable Performance To
Falcon-H1-0.5B	0.5 billion	256K	Typical 7B models (2024)
Falcon-H1-1.5B	1.5 billion	256K	7B-10B transformer models
Falcon-H1-1.5B-Deep	1.5 billion	256K	7B-10B transformer models
Falcon-H1-3B	3 billion	256K	Larger transformer models
Falcon-H1-7B	7 billion	256K	Larger transformer models
Falcon-H1-34B	34 billion	256K	Qwen 2.5-72B, LLaMA 3.3-70B

Falcon Arabic (May 2025)

Alongside Falcon-H1, TII released Falcon Arabic on May 21, 2025, the first Arabic-specific model in the Falcon series. Built on top of the Falcon 3-7B architecture, Falcon Arabic was trained on a high-quality native (non-translated) Arabic dataset spanning Modern Standard Arabic and regional dialects, capturing the full linguistic diversity of the Arab world ^[17].

According to the Open Arabic LLM Leaderboard benchmarks, Falcon Arabic outperformed all other regionally available Arabic language models at the time of release, matching the performance of models up to ten times its size. The model supports a context length of 32,000 tokens, enabling it to handle long documents and advanced applications like retrieval-augmented generation (RAG), content creation, and knowledge-intensive tasks. It is designed to excel in general knowledge, Arabic grammar, mathematical reasoning, complex problem solving, and understanding Arabic dialects ^[17].

Falcon H1R (January 2026)

On January 5, 2026, TII released Falcon H1R 7B (where "R" stands for reasoning), a specialized reasoning model built on the Falcon H1-7B hybrid architecture. The technical report describes it as "a 7B-parameter reasoning-optimized model that establishes the feasibility of achieving competitive reasoning performance with small language models (SLMs)" ^[23]. The model combines the hybrid Transformer-Mamba2 backbone with a training recipe that mixes supervised long-form reasoning with reinforcement learning using Group Relative Policy Optimization (GRPO) ^[18]^[23].

Falcon H1R 7B supports a 256,000-token context window and was designed specifically for test-time scaling, using a technique TII calls "DeepConf" (Deep Think with Confidence). TII defines DeepConf as "an efficient test-time scaling approach that dynamically filters parallel reasoning chains based on confidence scores derived from the model itself": it runs many chains of thought in parallel, then uses the model's own next-token confidence scores to filter noisy traces and keep only high-quality candidates ^[18]^[23].

How does Falcon H1R 7B compare to larger models?

Despite having only 7 billion parameters, the model demonstrated performance competitive with or surpassing models several times its size. On mathematics benchmarks, it achieved 88.1% accuracy on AIME-24, ahead of ServiceNow AI's Apriel 1.5 (15B) at 86.2%. On LiveCodeBench v6 (coding), it scored 68.6%, higher than Qwen3-32B. TII reported that the model matches or approaches the performance of Microsoft's Phi 4 Reasoning Plus (14B) while using only half the parameters ^[18]^[23]. For inference speed, Falcon H1R 7B reaches approximately 1,500 tokens per second per GPU at batch size 64, nearly double the throughput of Qwen3-8B ^[18].

Falcon H1-Arabic (January 2026)

Also on January 5, 2026, TII released Falcon-H1-Arabic, a newly developed model built on the hybrid Mamba-Transformer architecture specifically for Arabic language tasks. Available in 3B, 7B, and 34B parameter sizes, the model established itself as the highest-performing system on the Open Arabic LLM Leaderboard (OALL), outperforming models several times larger ^[19].

Falcon H1-Tiny (January 2026)

On January 16, 2026, TII released Falcon-H1-Tiny, a series of 15 extremely small yet powerful open-source language models. The family includes 90-million-parameter models for English, 100-million-parameter models for multilingual applications, and a 600-million-parameter reasoning model (Falcon-H1-Tiny-R-0.6B). The reasoning model was pretrained directly on reasoning data and then trained with a GRPO stage, showing strong performance on AIME-24, AIME-25, LiveCodeBench, and Math500 benchmarks. Specialized variants include a coder model (90M) for code generation and fill-in-the-middle tasks, and a tool-calling model (90M) for function-calling tasks ^[20].

Complete Model Timeline

Generation	Release Date	Models	Parameters	Training Tokens	Key Innovation
Falcon 1	May 2023	7B, 40B	7B, 40B	1-1.5T	RefinedWeb; Apache 2.0 open release
Falcon 180B	Sep 2023	180B	180B	3.5T	Largest open model at release
Falcon 2	May 2024	11B, 11B VLM	11B	5.5T	First multimodal Falcon; 11 languages
Falcon Mamba	Aug 2024	Mamba 7B	7B	~5.5T	First open-source pure SSLM at scale
Falcon 3	Dec 2024	1B, 3B, Mamba 7B, 7B, 10B	1B-10B	Up to 14T	SLMs for edge devices; 30+ checkpoints
Falcon H1	May 2025	0.5B to 34B	0.5B-34B	-	Hybrid Mamba-Transformer; 256K context
Falcon Arabic	May 2025	7B	7B	-	First native Arabic Falcon model
Falcon H1R	Jan 2026	7B	7B	-	Reasoning model; DeepConf test-time scaling
Falcon H1-Arabic	Jan 2026	3B, 7B, 34B	3B-34B	-	Top Arabic AI model on OALL
Falcon H1-Tiny	Jan 2026	15 models	90M-600M	-	Ultra-compact models; reasoning at 600M

Licensing History

TII's approach to licensing has evolved across Falcon generations, generally moving toward greater permissiveness.

Model	License	Key Terms
Falcon 7B, 40B (2023)	Apache 2.0	Fully permissive; commercial use allowed
Falcon 180B (2023)	TII Falcon 180B License	More restrictive; some commercial limitations
Falcon 2 11B (2024)	TII Falcon License	Permissive; Apache 2.0 based
Falcon Mamba 7B (2024)	TII Falcon Mamba 7B License 1.0	Open access
Falcon 3 (2024)	TII Falcon License 2.0	Apache 2.0 based with acceptable use policy
Falcon H1 (2025)	TII Falcon License	Apache 2.0 based; permissive
Falcon H1R (2026)	TII Falcon License	Open source

The early decision to use Apache 2.0 for the original 7B and 40B models was strategically significant. At the time, Meta's competing LLaMA models carried restrictive license terms, and Falcon's permissive licensing gave it an advantage for commercial adoption. The more restrictive license for Falcon 180B drew some criticism, but subsequent releases returned to permissive terms. The TII Falcon License 2.0, used for Falcon 3 and later models, is based on Apache 2.0 with an added acceptable use policy promoting responsible AI development ^[8]^[15].

Comparison with Competing Models

The Falcon family operates in a competitive landscape alongside several other prominent open-source LLM families.

Aspect	Falcon (TII)	LLaMA (Meta)	Mistral (Mistral AI)	Qwen (Alibaba)
Developer	TII (UAE government)	Meta AI (US)	Mistral AI (France)	Alibaba Cloud (China)
First Release	May 2023	February 2023	September 2023	August 2023
Max Parameters	180B	405B	141B (Mixtral)	235B (Qwen 3 MoE)
Training Data	RefinedWeb (web-only)	Mix of web, academic, code	Undisclosed	Undisclosed
License	Apache 2.0 / TII Falcon	LLaMA Community License	Apache 2.0 / commercial	Apache 2.0
Architectural Innovation	Hybrid Mamba-Transformer (H1)	Standard transformer	Sliding window attention	MoE variants
Multilingual	18 languages (H1)	Strong multilingual	Primarily English, French	29+ languages
Community Adoption	Moderate	Very high	High	Very high

Meta's LLaMA models have generally achieved broader community adoption, partly due to Meta's larger ecosystem and research reputation. However, Falcon models carved out a significant niche, particularly when the original LLaMA carried restrictive licensing terms. In terms of raw performance, Falcon 180B briefly held the advantage over LLaMA 2 70B on several benchmarks, but Meta's subsequent releases (including LLaMA 3 with models up to 405B parameters) have generally surpassed Falcon on most tasks at comparable or larger scales ^[8]^[21].

Mistral AI, a French startup, emerged as another strong competitor starting in late 2023. Mistral's models are known for their efficiency, with the 7B model delivering performance comparable to LLaMA 2 13B. Alibaba's Qwen series has become particularly prominent in 2025, with Qwen 3 offering models from 0.6B to 235B parameters and supporting 29+ languages ^[22].

Falcon's competitive advantage increasingly lies in its architectural innovation. The hybrid Mamba-Transformer approach of Falcon H1 offers fundamentally better scaling for long contexts compared to pure transformer models, and the Falcon H1R reasoning model demonstrated that small hybrid models can compete with much larger transformer-only alternatives on complex reasoning tasks.

What is Falcon used for?

Falcon models are general-purpose language models used for text generation, instruction following and conversation, coding, mathematical and complex reasoning, multilingual tasks (including dedicated Arabic models), and multimodal image understanding via the Falcon 2 11B VLM. Because many Falcon models are small and efficient, including the edge-focused Falcon 3 series and the ultra-compact Falcon-H1-Tiny family (as small as 90 million parameters), they are also used on laptops, single GPUs, and on-device deployments. Through AI71 and the AWS partnership, Falcon models are available for enterprise deployment via pay-as-you-go APIs on Amazon Bedrock Marketplace and Amazon SageMaker, with adopters spanning healthcare, finance, education, and government ^[3].

Impact and Legacy

Falcon's significance extends beyond its benchmark scores. The project demonstrated several important points about the state of AI development.

Web Data Sufficiency. The RefinedWeb paper and the Falcon models proved that web data alone, when properly filtered and deduplicated, could produce state-of-the-art language models. This finding influenced subsequent training data strategies across the industry ^[4].

Government-Led AI Research. Falcon showed that government-funded research institutes outside the traditional US-China AI axis could produce globally competitive models. TII's success inspired similar national AI initiatives in countries seeking technological sovereignty in AI.

Open-Source Competition. By releasing strong models under permissive licenses, TII contributed to the competitive pressure that pushed Meta and other companies toward more permissive licensing of their own models. The open-source AI ecosystem benefited from having multiple strong options rather than a single dominant provider.

Efficient Training. TII's approach to achieving competitive performance through data quality (with RefinedWeb) rather than simply scaling model size or compute budget offered a practical path for organizations with more limited resources.

Architectural Diversity. With Falcon Mamba and Falcon H1, TII has been at the forefront of exploring alternatives to the pure transformer architecture. The hybrid Mamba-Transformer approach in particular has shown promising results for long-context efficiency, and may influence the broader industry's approach to model architecture.

Arabic AI. The Falcon Arabic and Falcon H1-Arabic models represent a significant investment in Arabic-language AI, filling a gap in a region where most available language models were primarily trained on English and other Western languages.

Current State

As of early 2026, TII continues rapid development of the Falcon family. The release cadence has accelerated significantly, with major new models arriving every few months. The January 2026 releases of Falcon H1R, Falcon H1-Arabic, and Falcon H1-Tiny demonstrate TII's multi-pronged strategy: pushing the boundaries of reasoning performance with small efficient models, expanding Arabic-language capabilities, and exploring the limits of how small a useful language model can be.

The Falcon project remains one of the most visible examples of a non-Western, government-funded initiative producing globally competitive AI models. While the competitive landscape has grown considerably since 2023, with Meta's LLaMA, Mistral AI's models, Alibaba's Qwen, and DeepSeek's offerings all commanding significant market share, TII has differentiated itself through architectural innovation (the hybrid Mamba-Transformer approach), aggressive open licensing, and a focus on Arabic-language AI.

Through AI71 and the AWS partnership, Falcon models are available for enterprise deployment via pay-as-you-go APIs on Amazon Bedrock Marketplace and Amazon SageMaker, reducing the barrier to commercial adoption ^[3].

References

UAE's Technology Innovation Institute Launches Open-Source Falcon 40B Large Language Model - BusinessWire, May 2023 ↩
Technology Innovation Institute (TII) - The Applied Research Arm of ATRC - ATRC ↩
TII & AI71 Partner with AWS to Scale AI Innovation Globally - TII ↩
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only - arXiv (NeurIPS 2023) ↩
The Falcon Series of Open Language Models - arXiv, November 2023 ↩
Falcon2-11B Technical Report - arXiv, July 2024 ↩
Welcome to the Falcon 3 Family of Open Models - Hugging Face Blog, December 2024 ↩
The Falcon has landed in the Hugging Face ecosystem - Hugging Face Blog ↩
Spread Your Wings: Falcon 180B is here - Hugging Face Blog, September 2023 ↩
Falcon 180B open-source language model outperforms GPT-3.5 and Llama 2 - The Decoder ↩
Falcon 2: An 11B parameter pretrained language model and VLM - Hugging Face Blog, May 2024 ↩
Falcon2-11B VLM Model Card - Hugging Face ↩
TII Releases First SSLM With Falcon Mamba 7B - TII, August 2024 ↩
Welcome Falcon Mamba: The first strong attention-free 7B model - Hugging Face Blog ↩
Falcon 3: UAE's Technology Innovation Institute Launches World's Most Powerful Small AI Models - TII, December 2024 ↩
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance - Hugging Face Blog, May 2025 ↩
Falcon-Arabic: A Breakthrough in Arabic Language Models - Hugging Face Blog, May 2025 ↩
TII Launches Falcon Reasoning: Best 7B AI Model Globally, Also Outperforms Larger Models - TII, January 2026 ↩
Abu Dhabi's TII Launches Falcon-H1 Arabic, Establishing World's Leading Arabic AI Model - TII, January 2026 ↩
Falcon-H1-Tiny: A series of extremely small, yet powerful language models - Hugging Face, January 2026 ↩
UAE's TII challenges big tech dominance with open source Falcon AI models - Computer Weekly ↩
UAE's Falcon 3 challenges open-source leaders amid surging demand for small AI models - VentureBeat ↩
Falcon-H1R: Pushing the Reasoning Frontier with a Small Hybrid Model - arXiv, January 2026 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

6 revisions by 1 contributors · full history

Suggest edit

Falcon (language model)

When was Falcon released?

Is Falcon open source?

Technology Innovation Institute

RefinedWeb Dataset

Architecture and Design

Falcon 7B and 40B (May 2023)

Falcon 180B (September 2023)

Falcon 2 (May 2024)

Falcon Mamba (August 2024)

Falcon 3 (December 2024)

Falcon 3 Benchmark Results

Falcon H1 (May 2025)

Falcon Arabic (May 2025)

Falcon H1R (January 2026)

How does Falcon H1R 7B compare to larger models?

Falcon H1-Arabic (January 2026)

Falcon H1-Tiny (January 2026)

Complete Model Timeline

Licensing History

Comparison with Competing Models

What is Falcon used for?

Impact and Legacy

Current State

See Also

References

Improve this article

What links here (24 of 55)

What links here (24 of 55)

When was Falcon released?

Is Falcon open source?

Technology Innovation Institute

RefinedWeb Dataset

Architecture and Design

Falcon 7B and 40B (May 2023)

Falcon 180B (September 2023)

Falcon 2 (May 2024)

Falcon Mamba (August 2024)

Falcon 3 (December 2024)

Falcon 3 Benchmark Results

Falcon H1 (May 2025)

Falcon Arabic (May 2025)

Falcon H1R (January 2026)

How does Falcon H1R 7B compare to larger models?

Falcon H1-Arabic (January 2026)

Falcon H1-Tiny (January 2026)

Complete Model Timeline

Licensing History

Comparison with Competing Models

What is Falcon used for?

Impact and Legacy

Current State

See Also

References

Improve this article

Related Articles

LLaMA

Qwen

Llama 3

mT5

Sentence-transformers/all-MiniLM-L6-v2 model

Sentence-transformers/all-mpnet-base-v2 model

What links here (24 of 55)

Related Articles

LLaMA

Qwen

Llama 3

mT5

Sentence-transformers/all-MiniLM-L6-v2 model

Sentence-transformers/all-mpnet-base-v2 model

What links here (24 of 55)