Falcon is a family of open-source large language models developed by the Technology Innovation Institute (TII), the applied research pillar of the Abu Dhabi government's Advanced Technology Research Council (ATRC) in the United Arab Emirates. First released in May 2023, the Falcon series gained immediate recognition for its strong performance relative to model size, briefly claiming the top position on the Hugging Face Open LLM Leaderboard. The family has expanded through several generations, from the original Falcon 7B and 40B to Falcon 180B, Falcon 2, Falcon 3, Falcon Mamba, Falcon H1, and Falcon H1R, establishing TII as one of the most significant non-Western contributors to open-source AI development [1].
Falcon models are notable for their reliance on the RefinedWeb dataset, a massive web-only corpus that demonstrated high-quality language models could be trained without expensive curated datasets. The models have been released under permissive licenses, primarily the TII Falcon License (based on Apache 2.0), making them freely available for both research and commercial use. Over the course of three years, TII has steadily expanded the Falcon family to include multimodal models, state-space architectures, hybrid Mamba-Transformer designs, specialized Arabic language models, and ultra-compact reasoning models.
The Technology Innovation Institute was founded in May 2020 as part of Abu Dhabi's broader strategy to establish the UAE as a global hub for advanced technology research. TII operates under the Advanced Technology Research Council (ATRC), a government body that oversees the emirate's research and development ecosystem [2].
TII comprises dedicated research centers, each focused on a priority technology domain: Advanced Materials, AI and Digital Science, Autonomous Robotics, Cryptography, Directed Energy, Propulsion and Space, Quantum, Renewable and Sustainable Energy, Secure Systems, and Biotechnology. The AI and Digital Science Research Center is responsible for the Falcon project [2].
Funding for TII comes directly from the Abu Dhabi government, which views AI as a strategic priority for the UAE's economic diversification and technological sovereignty. While specific budget figures have not been publicly disclosed, TII has committed to multimillion-dollar research partnerships, including a three-year collaboration with the California Institute of Technology covering autonomous systems, quantum algorithms, AI, biotechnology, and propulsion systems [2]. The institute has recruited researchers and scientists from around the world, building a globally competitive team.
The Falcon project represents TII's most visible and widely adopted contribution to the global AI landscape, placing a Middle Eastern government research lab in direct competition with major Western tech companies and academic institutions. Its success has inspired similar national AI initiatives in Saudi Arabia, South Korea, and other countries seeking technological sovereignty in artificial intelligence.
In 2023, TII spun off AI71, a commercial entity dedicated to deploying Falcon models in enterprise settings. AI71 offers API access to Falcon models and works with businesses across healthcare, finance, education, and government sectors to integrate the technology into their operations. In partnership with Amazon Web Services (AWS), AI71 has made Falcon models available through Amazon Bedrock Marketplace and Amazon SageMaker, enabling enterprises to integrate Falcon into their applications through pay-as-you-go APIs [3].
A central innovation behind the Falcon models is the RefinedWeb dataset, described in a paper presented at NeurIPS 2023 [4]. RefinedWeb is a massive English-language pretraining corpus containing approximately five trillion tokens, extracted entirely from CommonCrawl web data. The dataset covers roughly 10 billion web documents and was built using a pipeline called MacroData Refinement (MDR).
The MDR pipeline applies extensive filtering and deduplication to raw web crawl data. The process begins by filtering URLs to remove adult content using a blocklist and scoring system, then extracting content from web pages using the trafilatura library, and performing language identification with fastText. After initial extraction, heuristics from MassiveWeb are applied along with line-wise corrections to further clean the data. The pipeline combines both exact and fuzzy deduplication techniques at very large scale, ultimately removing nearly 90% of the original web content to produce a clean, high-quality training corpus [4].
The key finding of the RefinedWeb paper was that properly filtered and deduplicated web data alone could produce models that outperform those trained on popular curated datasets like The Pile. This was a significant result for the field. Prior to RefinedWeb, the conventional wisdom held that training the best language models required carefully curated multi-source datasets. TII demonstrated that the scale and diversity of web data, when properly cleaned, could match or exceed the quality of curated alternatives. The development process was iterative: the team measured zero-shot performance of models trained on development versions of the dataset, with the main goal of maximizing performance while manually auditing samples to identify filtering improvements [4].
A 600-billion-token extract of RefinedWeb was publicly released to the research community on Hugging Face, along with smaller models (1.3B and 7.5B parameters) trained on it [4].
| Dataset | Tokens | Sources | Key Feature |
|---|---|---|---|
| RefinedWeb | ~5 trillion | CommonCrawl (web only) | MDR pipeline with extensive deduplication |
| The Pile | ~825 billion | 22 curated sources | Academic papers, books, code, web |
| ROOTS (BLOOM) | ~350 billion unique | 498 sources, 59 languages | Multilingual, curated |
| RedPajama | ~1.2 trillion | 7 source categories | Open reproduction of LLaMA training data |
The Falcon architecture is based on the decoder-only transformer design that has become standard for autoregressive language models. Key architectural features across the Falcon family include several design choices that were becoming standard for high-performance language models in 2023.
Attention Mechanism. The original Falcon models (7B and 40B) use multi-query attention (MQA), which reduces the memory bandwidth requirements during inference by sharing key and value heads across multiple query heads. This is particularly important for serving large models efficiently. Falcon 2 11B adopted grouped-query attention (GQA) with eight key-value heads, offering a middle ground between full multi-head attention and the more aggressive sharing in MQA [5][6].
Positional Encoding. The models use rotary position embeddings (RoPE), which encode positional information through rotation matrices applied to the query and key vectors. RoPE allows for better extrapolation to longer sequence lengths than were seen during training.
Activation Function. Falcon 3 models adopted the SwiGLU activation function, replacing the standard GELU used in earlier Falcon models. SwiGLU has been shown to improve training efficiency and model quality in a number of modern LLM architectures.
Head Dimension. Falcon 3 models use a head dimension of 256, which was specifically chosen to be optimized for FlashAttention-3, improving training and inference throughput [7].
Vocabulary. Falcon 3 transformer models use a vocabulary of approximately 131,000 tokens, significantly larger than the vocabulary used in Falcon 1, which allows better handling of multilingual text and code [7].
FlashAttention. The training process has incorporated FlashAttention from the beginning, an IO-aware exact attention algorithm that speeds up transformer training by reducing memory reads and writes between GPU high-bandwidth memory and on-chip SRAM.
Training Infrastructure. Falcon models were trained on large GPU clusters. Falcon 180B, for example, required approximately 4,000 GPU-hours of compute on A100 GPUs, making it one of the most resource-intensive open model training runs at the time of its release. Falcon 3 7B was trained on 1,024 H100 GPU chips [6][7].
Llama Compatibility. Starting with Falcon 3, TII made the models compatible with the Llama architecture format, allowing users to leverage the extensive tooling and ecosystem built around Meta's models without modification [7].
TII released the first Falcon models in May 2023: Falcon 7B and Falcon 40B. Both were decoder-only transformer models trained on subsets of the RefinedWeb dataset. Falcon 40B was trained on one trillion tokens, while Falcon 7B was trained on 1.5 trillion tokens [1][5].
Upon release, Falcon 40B achieved the top position on the Hugging Face Open LLM Leaderboard, making it the best-performing openly available pretrained model at that time. Falcon 7B also led its weight class, edging out MosaicML's MPT-7B as the best pretrained model at the 7-billion-parameter scale [8]. Both models were released under the Apache 2.0 license, which was a notable move since Meta's competing LLaMA models at the time carried more restrictive license terms.
Instruction-tuned variants, Falcon 7B Instruct and Falcon 40B Instruct, were also released. These fine-tuned versions were designed for conversational and instruction-following tasks.
| Model | Parameters | Training Tokens | Attention | License | Release |
|---|---|---|---|---|---|
| Falcon 7B | 7 billion | 1.5 trillion | Multi-query | Apache 2.0 | May 2023 |
| Falcon 40B | 40 billion | 1 trillion | Multi-query | Apache 2.0 | May 2023 |
| Falcon 7B Instruct | 7 billion | 1.5T + fine-tuning | Multi-query | Apache 2.0 | May 2023 |
| Falcon 40B Instruct | 40 billion | 1T + fine-tuning | Multi-query | Apache 2.0 | May 2023 |
On September 6, 2023, TII released Falcon 180B, a dramatically scaled-up version with 180 billion parameters trained on 3.5 trillion tokens from the RefinedWeb dataset. At the time of its release, Falcon 180B was the largest openly available language model in the world [9].
The model achieved a score of 68.74 on the Hugging Face Open LLM Leaderboard, making it the highest-scoring openly released pretrained model at that time. Performance was estimated to fall between GPT-3.5 and GPT-4 depending on the task, and the model consistently matched or surpassed Google's PaLM 2 Medium on widely used benchmarks including HellaSwag, LAMBADA, WebQuestions, and Winogrande [9][10].
Falcon 180B was 2.5 times larger than Meta's LLaMA 2 70B and was trained with approximately four times more compute. On the MMLU benchmark, it outperformed both LLaMA 2 70B and GPT-3.5, although it fell short of GPT-4 [10].
The model was released under TII's Falcon 180B license, which was more restrictive than the Apache 2.0 license used for the smaller models. While still permitting research and commercial use, the license included some limitations that prevented it from being classified as fully open-source by some definitions. This distinction drew attention in the open-source AI community, where license terms are closely scrutinized [9].
| Benchmark | Falcon 180B | LLaMA 2 70B | GPT-3.5 | GPT-4 |
|---|---|---|---|---|
| MMLU (5-shot) | 70.4 | 68.9 | 70.0 | 86.4 |
| HellaSwag (10-shot) | 88.0 | 87.3 | 85.5 | 95.3 |
| ARC (25-shot) | 69.8 | 67.3 | 85.2 | 96.3 |
| HF Open LLM Score | 68.74 | ~67 | N/A | N/A |
In May 2024, TII released Falcon 2, representing the second generation of the model family. This release focused on a smaller, more efficient model size while adding multimodal capabilities [11].
Falcon 2 11B was trained on 5.5 trillion tokens, a significant increase over the original Falcon models. The model uses grouped-query attention (GQA) with eight key-value heads. Despite having only 11 billion parameters, the model outperformed Meta's LLaMA 3 8B and performed on par with Google's Gemma 7B on the Hugging Face Open LLM Leaderboard [11].
The more notable release was Falcon 2 11B VLM (Vision Language Model), TII's first multimodal model. The VLM integrates the pretrained CLIP ViT-L/14 vision encoder with the Falcon 2 11B chat-fine-tuned model. Training was carried out in two stages: a pretraining stage where the LLM was frozen and only the multimodal projector was trained on 558,000 image-caption pairs, followed by a fine-tuning stage where both the projector and LLM weights were trained on 1.2 million image-text instruction data from public datasets. The VLM employs a dynamic encoding mechanism at high resolution for image inputs, similar to the approach used in LLaVA-Next. Training was performed on 16 A100 80GB GPUs with ZeRO and Flash-Attention 2 [12].
Falcon 2 was trained on data spanning 11 languages (English, German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, and Swedish), expanding the model's multilingual capabilities beyond the primarily English focus of the original Falcon models. Both variants were released under the TII Falcon License [11].
In August 2024, TII released Falcon Mamba 7B, a model based on the state space model (SSM) architecture rather than the traditional transformer. It was the first open-source pure state-space language model (SSLM) at this scale, and Hugging Face independently verified it as the top-performing open-source SSLM globally at the time of release [13].
The model uses the original Mamba architecture with additional RMS normalization layers added for stable large-scale training. Unlike transformer models, which use attention mechanisms that scale quadratically with sequence length, Falcon Mamba processes sequences with constant-time token generation regardless of context size. This means memory requirements remain constant (only the recurrent state is stored), rather than growing linearly with the key-value cache as transformers do [13][14].
Falcon Mamba 7B was trained on approximately 5,500 billion tokens, using a mix of RefinedWeb data, high-quality technical data, code data from public sources, and curated data in the final decay stage. On the original Hugging Face Open LLM Leaderboard (v1), the model achieved an average score of 64.09, which was competitive with Falcon 2 11B (64.28) despite having fewer parameters. Specific scores included 80.82 on HellaSwag, 62.11 on MMLU, and 62.03 on ARC [14].
A key practical advantage of Falcon Mamba is that it can process sequences of arbitrary length on a single NVIDIA A10 GPU with 24GB of memory, while transformer models of similar size would be constrained by the growing KV cache. On an H100 GPU, the model maintains constant throughput and steady CUDA memory usage even at context lengths of 130,000 tokens, whereas transformer models degrade in speed and consume increasing memory [14].
| Aspect | Falcon Mamba (SSM) | Transformer Models |
|---|---|---|
| Attention | None (selective state spaces) | Full attention mechanism |
| Token Generation | Constant time regardless of context | Scales with sequence length |
| Memory | Constant (recurrent state only) | Linear scaling (KV caches) |
| Arbitrary Length | Supported on single A10 24GB GPU | Limited by KV cache size |
Falcon Mamba 7B was released under the TII Falcon Mamba 7B License 1.0, with both base and instruct-tuned variants available on Hugging Face [13].
On December 17, 2024, TII released Falcon 3, the third generation of the model family. Falcon 3 shifted focus toward small language models (SLMs) designed to run efficiently on consumer hardware, including laptops and single GPUs [15].
The Falcon 3 series includes five models: Falcon3-1B, Falcon3-3B, Falcon3-Mamba-7B (an updated state-space variant), Falcon3-7B, and Falcon3-10B. All transformer models were trained on approximately 14 trillion tokens, more than double the 5.5 trillion tokens used for Falcon 2. The training data comprised a mix of web content, code, STEM material, curated high-quality corpora, and multilingual data. Models are available in both Base and Instruct variants, totaling over 30 model checkpoints [7][15].
The different model sizes were produced using distinct training strategies. Falcon3-7B-Base was the primary model, trained from scratch on 1,024 H100 GPU chips. Falcon3-10B-Base was created through depth up-scaling from the 7B model: layers were duplicated, and the model received an additional 2 trillion tokens of continued pre-training on high-quality data. Falcon3-1B and Falcon3-3B were derived from the larger models through pruning and knowledge distillation, trained on less than 100 billion tokens of curated high-quality data. Falcon3-Mamba-7B-Base received an additional 1.5 trillion tokens of high-quality training data on top of the original Falcon Mamba 7B, enhancing its reasoning and mathematical capabilities [7].
Upon release, the Falcon3-10B model achieved the number one position on the Hugging Face Open LLM Leaderboard for models under 13 billion parameters. The 3B model notably outperformed larger models like LLaMA 3.1 8B and NVIDIA Minitron 4B on several benchmarks. The 1B model surpassed SmolLM2-1.7B and matched Gemma-2-2B despite being smaller. Falcon 3 supports English, French, Spanish, and Portuguese, with a context length of up to 32,000 tokens (8,000 for the 1B model) [7][15].
| Model | Parameters | Training Tokens | Method | Context Length |
|---|---|---|---|---|
| Falcon3-1B | 1 billion | <100B (distilled) | Pruning + knowledge distillation | 8K |
| Falcon3-3B | 3 billion | <100B (distilled) | Pruning + knowledge distillation | 32K |
| Falcon3-Mamba-7B | 7 billion | ~7T total | Continued training of Falcon Mamba | 32K |
| Falcon3-7B | 7 billion | 14 trillion | Trained from scratch | 32K |
| Falcon3-10B | 10 billion | 14T + 2T (up-scaling) | Depth up-scaling from 7B | 32K |
| Benchmark | Falcon3-1B-Instruct | Falcon3-3B-Instruct | Falcon3-7B-Base | Falcon3-10B-Base |
|---|---|---|---|---|
| MMLU | - | - | 67.4 | 73.1 |
| MMLU-PRO | - | 29.7 | 39.2 | 42.5 |
| ARC Challenge | - | - | 65.9 | - |
| GSM8K | - | - | 79.1 | 83.1 |
| BBH | - | - | 51.0 | 59.7 |
| IFEval | 54.4 | - | - | 78.0 |
| MATH Lvl5 | - | 19.9 | - | 22.9 |
| MBPP | - | - | - | 73.8 |
All Falcon 3 models were released under the TII Falcon License 2.0, a permissive Apache 2.0-based license with an acceptable use policy that encourages responsible AI development [15].
On May 21, 2025, TII released Falcon-H1, a new family of hybrid language models that combine traditional transformer attention mechanisms with Mamba-2 state-space model components operating in parallel within each block. The "H" in the name stands for "hybrid." This architectural approach seeks to combine the strong contextual understanding of attention with the linear-time efficiency of SSMs, particularly for long sequences [16].
Falcon-H1 covers six model sizes: 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B, each available in base and instruct-tuned variants. The models support a context window of up to 256,000 tokens and natively handle 18 languages (Arabic, Czech, German, English, Spanish, French, Hindi, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Romanian, Russian, Swedish, Urdu, and Chinese), with scalability to over 100 languages [16].
The architectural design places attention and SSM heads in parallel within each block, and their outputs are concatenated before being passed through the block's output projection. The ratio of attention to SSM heads is adjustable, and the models use extremely large RoPE scaling in positional embeddings, which is a technique unique to hybrid models. TII developed a customized Maximal Update Parametrization (muP) with 35 parameter groups and fine-tuned multipliers, enabling hyperparameter transfer across model sizes and allowing all six models to be trained in parallel [16].
A key training innovation was "reverse curriculum learning," where complex data is shown from the start of training rather than being introduced gradually. The team also employed smart data reuse, where high-quality samples are reused more frequently without harming generalization, guided by memorization window estimation [16].
Performance results were strong across the size range. Falcon-H1-0.5B achieved results comparable to typical 7B models from 2024. Falcon-H1-1.5B-Deep rivaled 7B to 10B transformer models. At the top end, Falcon-H1-34B-Instruct was competitive with models two to four times larger, including Qwen 2.5-72B and LLaMA 3.3-70B. For inference efficiency, Falcon-H1-34B achieved up to four times faster input throughput and eight times faster output throughput compared to Qwen 2.5-32B on longer contexts, thanks to the SSM component's linear scaling [16].
| Model | Parameters | Context | Comparable Performance To |
|---|---|---|---|
| Falcon-H1-0.5B | 0.5 billion | 256K | Typical 7B models (2024) |
| Falcon-H1-1.5B | 1.5 billion | 256K | 7B-10B transformer models |
| Falcon-H1-1.5B-Deep | 1.5 billion | 256K | 7B-10B transformer models |
| Falcon-H1-3B | 3 billion | 256K | Larger transformer models |
| Falcon-H1-7B | 7 billion | 256K | Larger transformer models |
| Falcon-H1-34B | 34 billion | 256K | Qwen 2.5-72B, LLaMA 3.3-70B |
Alongside Falcon-H1, TII released Falcon Arabic on May 21, 2025, the first Arabic-specific model in the Falcon series. Built on top of the Falcon 3-7B architecture, Falcon Arabic was trained on a high-quality native (non-translated) Arabic dataset spanning Modern Standard Arabic and regional dialects, capturing the full linguistic diversity of the Arab world [17].
According to the Open Arabic LLM Leaderboard benchmarks, Falcon Arabic outperformed all other regionally available Arabic language models at the time of release, matching the performance of models up to ten times its size. The model supports a context length of 32,000 tokens, enabling it to handle long documents and advanced applications like retrieval-augmented generation (RAG), content creation, and knowledge-intensive tasks. It is designed to excel in general knowledge, Arabic grammar, mathematical reasoning, complex problem solving, and understanding Arabic dialects [17].
On January 5, 2026, TII released Falcon H1R 7B (where "R" stands for reasoning), a specialized reasoning model built on the Falcon H1-7B hybrid architecture. The model combines the hybrid Transformer-Mamba2 backbone with a training recipe that mixes supervised long-form reasoning with reinforcement learning using Group Relative Policy Optimization (GRPO) [18].
Falcon H1R 7B supports a 256,000-token context window and was designed specifically for test-time scaling, using a technique TII calls "DeepConf" (Deep Think with Confidence). DeepConf runs many chains of thought in parallel, then uses the model's own next-token confidence scores to filter noisy traces and keep only high-quality candidates [18].
Despite having only 7 billion parameters, the model demonstrated performance competitive with or surpassing models several times its size. On mathematics benchmarks, it achieved 88.1% accuracy on AIME-24, ahead of ServiceNow AI's Apriel 1.5 (15B) at 86.2%. On LiveCodeBench v6 (coding), it scored 68.6%, higher than Qwen3-32B. TII reported that the model matches or approaches the performance of Microsoft's Phi 4 Reasoning Plus (14B) while using only half the parameters. For inference speed, Falcon H1R 7B reaches approximately 1,500 tokens per second per GPU at batch size 64, nearly double the throughput of Qwen3-8B [18].
Also on January 5, 2026, TII released Falcon-H1-Arabic, a newly developed model built on the hybrid Mamba-Transformer architecture specifically for Arabic language tasks. Available in 3B, 7B, and 34B parameter sizes, the model established itself as the highest-performing system on the Open Arabic LLM Leaderboard (OALL), outperforming models several times larger [19].
On January 16, 2026, TII released Falcon-H1-Tiny, a series of 15 extremely small yet powerful open-source language models. The family includes 90-million-parameter models for English, 100-million-parameter models for multilingual applications, and a 600-million-parameter reasoning model (Falcon-H1-Tiny-R-0.6B). The reasoning model was pretrained directly on reasoning data and then trained with a GRPO stage, showing strong performance on AIME-24, AIME-25, LiveCodeBench, and Math500 benchmarks. Specialized variants include a coder model (90M) for code generation and fill-in-the-middle tasks, and a tool-calling model (90M) for function-calling tasks [20].
| Generation | Release Date | Models | Parameters | Training Tokens | Key Innovation |
|---|---|---|---|---|---|
| Falcon 1 | May 2023 | 7B, 40B | 7B, 40B | 1-1.5T | RefinedWeb; Apache 2.0 open release |
| Falcon 180B | Sep 2023 | 180B | 180B | 3.5T | Largest open model at release |
| Falcon 2 | May 2024 | 11B, 11B VLM | 11B | 5.5T | First multimodal Falcon; 11 languages |
| Falcon Mamba | Aug 2024 | Mamba 7B | 7B | ~5.5T | First open-source pure SSLM at scale |
| Falcon 3 | Dec 2024 | 1B, 3B, Mamba 7B, 7B, 10B | 1B-10B | Up to 14T | SLMs for edge devices; 30+ checkpoints |
| Falcon H1 | May 2025 | 0.5B to 34B | 0.5B-34B | - | Hybrid Mamba-Transformer; 256K context |
| Falcon Arabic | May 2025 | 7B | 7B | - | First native Arabic Falcon model |
| Falcon H1R | Jan 2026 | 7B | 7B | - | Reasoning model; DeepConf test-time scaling |
| Falcon H1-Arabic | Jan 2026 | 3B, 7B, 34B | 3B-34B | - | Top Arabic AI model on OALL |
| Falcon H1-Tiny | Jan 2026 | 15 models | 90M-600M | - | Ultra-compact models; reasoning at 600M |
TII's approach to licensing has evolved across Falcon generations, generally moving toward greater permissiveness.
| Model | License | Key Terms |
|---|---|---|
| Falcon 7B, 40B (2023) | Apache 2.0 | Fully permissive; commercial use allowed |
| Falcon 180B (2023) | TII Falcon 180B License | More restrictive; some commercial limitations |
| Falcon 2 11B (2024) | TII Falcon License | Permissive; Apache 2.0 based |
| Falcon Mamba 7B (2024) | TII Falcon Mamba 7B License 1.0 | Open access |
| Falcon 3 (2024) | TII Falcon License 2.0 | Apache 2.0 based with acceptable use policy |
| Falcon H1 (2025) | TII Falcon License | Apache 2.0 based; permissive |
| Falcon H1R (2026) | TII Falcon License | Open source |
The early decision to use Apache 2.0 for the original 7B and 40B models was strategically significant. At the time, Meta's competing LLaMA models carried restrictive license terms, and Falcon's permissive licensing gave it an advantage for commercial adoption. The more restrictive license for Falcon 180B drew some criticism, but subsequent releases returned to permissive terms. The TII Falcon License 2.0, used for Falcon 3 and later models, is based on Apache 2.0 with an added acceptable use policy promoting responsible AI development [8][15].
The Falcon family operates in a competitive landscape alongside several other prominent open-source LLM families.
| Aspect | Falcon (TII) | LLaMA (Meta) | Mistral (Mistral AI) | Qwen (Alibaba) |
|---|---|---|---|---|
| Developer | TII (UAE government) | Meta AI (US) | Mistral AI (France) | Alibaba Cloud (China) |
| First Release | May 2023 | February 2023 | September 2023 | August 2023 |
| Max Parameters | 180B | 405B | 141B (Mixtral) | 235B (Qwen 3 MoE) |
| Training Data | RefinedWeb (web-only) | Mix of web, academic, code | Undisclosed | Undisclosed |
| License | Apache 2.0 / TII Falcon | LLaMA Community License | Apache 2.0 / commercial | Apache 2.0 |
| Architectural Innovation | Hybrid Mamba-Transformer (H1) | Standard transformer | Sliding window attention | MoE variants |
| Multilingual | 18 languages (H1) | Strong multilingual | Primarily English, French | 29+ languages |
| Community Adoption | Moderate | Very high | High | Very high |
Meta's LLaMA models have generally achieved broader community adoption, partly due to Meta's larger ecosystem and research reputation. However, Falcon models carved out a significant niche, particularly when the original LLaMA carried restrictive licensing terms. In terms of raw performance, Falcon 180B briefly held the advantage over LLaMA 2 70B on several benchmarks, but Meta's subsequent releases (including LLaMA 3 with models up to 405B parameters) have generally surpassed Falcon on most tasks at comparable or larger scales [8][21].
Mistral AI, a French startup, emerged as another strong competitor starting in late 2023. Mistral's models are known for their efficiency, with the 7B model delivering performance comparable to LLaMA 2 13B. Alibaba's Qwen series has become particularly prominent in 2025, with Qwen 3 offering models from 0.6B to 235B parameters and supporting 29+ languages [22].
Falcon's competitive advantage increasingly lies in its architectural innovation. The hybrid Mamba-Transformer approach of Falcon H1 offers fundamentally better scaling for long contexts compared to pure transformer models, and the Falcon H1R reasoning model demonstrated that small hybrid models can compete with much larger transformer-only alternatives on complex reasoning tasks.
Falcon's significance extends beyond its benchmark scores. The project demonstrated several important points about the state of AI development.
Web Data Sufficiency. The RefinedWeb paper and the Falcon models proved that web data alone, when properly filtered and deduplicated, could produce state-of-the-art language models. This finding influenced subsequent training data strategies across the industry [4].
Government-Led AI Research. Falcon showed that government-funded research institutes outside the traditional US-China AI axis could produce globally competitive models. TII's success inspired similar national AI initiatives in countries seeking technological sovereignty in AI.
Open-Source Competition. By releasing strong models under permissive licenses, TII contributed to the competitive pressure that pushed Meta and other companies toward more permissive licensing of their own models. The open-source AI ecosystem benefited from having multiple strong options rather than a single dominant provider.
Efficient Training. TII's approach to achieving competitive performance through data quality (with RefinedWeb) rather than simply scaling model size or compute budget offered a practical path for organizations with more limited resources.
Architectural Diversity. With Falcon Mamba and Falcon H1, TII has been at the forefront of exploring alternatives to the pure transformer architecture. The hybrid Mamba-Transformer approach in particular has shown promising results for long-context efficiency, and may influence the broader industry's approach to model architecture.
Arabic AI. The Falcon Arabic and Falcon H1-Arabic models represent a significant investment in Arabic-language AI, filling a gap in a region where most available language models were primarily trained on English and other Western languages.
As of early 2026, TII continues rapid development of the Falcon family. The release cadence has accelerated significantly, with major new models arriving every few months. The January 2026 releases of Falcon H1R, Falcon H1-Arabic, and Falcon H1-Tiny demonstrate TII's multi-pronged strategy: pushing the boundaries of reasoning performance with small efficient models, expanding Arabic-language capabilities, and exploring the limits of how small a useful language model can be.
The Falcon project remains one of the most visible examples of a non-Western, government-funded initiative producing globally competitive AI models. While the competitive landscape has grown considerably since 2023, with Meta's LLaMA, Mistral AI's models, Alibaba's Qwen, and DeepSeek's offerings all commanding significant market share, TII has differentiated itself through architectural innovation (the hybrid Mamba-Transformer approach), aggressive open licensing, and a focus on Arabic-language AI.
Through AI71 and the AWS partnership, Falcon models are available for enterprise deployment via pay-as-you-go APIs on Amazon Bedrock Marketplace and Amazon SageMaker, reducing the barrier to commercial adoption [3].