Falcon-H1
Last reviewed
Jun 8, 2026
Sources
7 citations
Review status
Source-backed
Revision
v1 · 1,547 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 8, 2026
Sources
7 citations
Review status
Source-backed
Revision
v1 · 1,547 words
Add missing citations, update stale details, or suggest a clearer explanation.
Falcon-H1 is a family of open-weight large language models released in 2025 by the Technology Innovation Institute (TII), the applied-research arm of Abu Dhabi's Advanced Technology Research Council in the United Arab Emirates. Falcon-H1 is built on a hybrid "hybrid-head" architecture that runs classical Transformer self-attention and Mamba-2 state space model (SSM) heads in parallel inside the same block, aiming to combine the recall and in-context learning strengths of attention with the linear-time efficiency and long-context scaling of SSMs.[1][2] The family spans six sizes from 0.5 billion to 34 billion parameters, each offered in base and instruction-tuned variants, supports a context window of up to roughly 256,000 tokens, and handles 18 languages natively. TII positions Falcon-H1 as a parameter-efficient line in which each model is designed to match or exceed the quality of conventional Transformer models at least twice its size, with the flagship Falcon-H1-34B reported to rival 70B-class models.[1][3]
Falcon-H1 was announced on 20 May 2025 as the first hybrid attention-SSM entry in TII's Falcon series.[2][4] It was released with open weights on Hugging Face under the tiiuae organization and on TII's Falcon LLM site, alongside base, instruction-tuned, and quantized checkpoints. A detailed technical report, titled "Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance," followed and was posted to arXiv on 31 July 2025 (arXiv:2507.22448).[3][5]
The central design goal of Falcon-H1 is parameter efficiency. According to TII, the smallest member, Falcon-H1-0.5B, delivers quality comparable to typical 7B Transformer models from 2024, while the Falcon-H1-1.5B-Deep variant is claimed to rival many leading 7B to 10B models. More broadly, the developers state that each model is engineered to match or surpass models at least twice its size.[1] These are vendor claims tied to TII's own evaluation suite and should be read as such.
The line later gained a reasoning-focused extension. In January 2026, TII released Falcon-H1R 7B, a reasoning-tuned 7B hybrid model that the institute said could out-reason models several times its size.[6] Falcon-H1R builds on the same hybrid-head foundation introduced by Falcon-H1.
The Technology Innovation Institute is a government-backed research organization based in Abu Dhabi, established in 2020 under the emirate's Advanced Technology Research Council. TII's AI and digital science teams develop the open Falcon series of language models, which began with the 2023 release of Falcon 7B, 40B, and 180B Transformer models that were among the most capable open-weight LLMs of their time.[2] TII subsequently shipped Falcon 3, a 2024 generation of efficient Transformer models, and Falcon Mamba 7B, a pure SSM model that contained no attention layers. Falcon-H1 represents a convergence of these two lines, pairing the attention mechanism of the original Falcon and Falcon 3 with the Mamba-style state-space modeling explored in Falcon Mamba.[1][2]
Falcon-H1 uses a parallel hybrid, sometimes called a hybrid-head, mixer block. Within each block the self-attention module and a Mamba-2 SSM module operate concurrently on the same input, and their outputs are concatenated before being passed through the block's output projection.[1][3] This contrasts with sequential hybrids such as Jamba, which interleave whole Transformer layers and Mamba layers in a stack, and is closely related to the hybrid-head idea introduced by NVIDIA's Hymba, which also fuses attention and SSM heads inside a layer.
A defining property of the parallel design is that the proportion of attention channels versus SSM channels can be tuned independently of the layer count. TII reports that a relatively small fraction of attention is sufficient for strong performance, and that increasing the share of attention channels tended to degrade results in their ablations, whereas balancing the SSM and feed-forward (MLP) allocations produced robust gains.[3] The default Falcon-H1 configuration allocates channels to the SSM, attention, and MLP components in roughly a 2:1:5 ratio, reflecting that the three component types scale their parameter counts differently with model width.[3]
Other notable architecture and training choices reported in the technical report include:
The hybrid architecture has drawn third-party engineering attention. NVIDIA published a technical guide on implementing the Falcon-H1 hybrid architecture in its Megatron-Core training framework, indicating interest in the parallel attention-SSM block beyond TII itself.[7]
The family comprises six model scales, each released in both a base (pretrained) and an instruction-tuned variant. The sizes are 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B parameters.[1][2] All models share the parallel hybrid-head architecture and the long-context capability; the 1.5B-Deep model differs from the standard 1.5B model in being deeper rather than wider.
The table below summarizes the family as described by TII. Benchmark-derived efficiency claims are TII's own and are attributed accordingly.
| Attribute | Detail |
|---|---|
| Developer | Technology Innovation Institute (TII), Abu Dhabi, UAE |
| Announced | 20 May 2025 |
| Technical report | arXiv:2507.22448 (31 July 2025) |
| Architecture | Parallel hybrid: Transformer attention plus Mamba-2 SSM heads in the same block |
| Model sizes | 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, 34B parameters |
| Variants | Base and instruction-tuned (plus quantized releases) |
| Context window | Up to about 256K tokens (reported as up to 262K) |
| Languages | 18 supported natively, with stated scalability to 100+ |
| Channel ratio (default) | Approximately 2:1:5 for SSM, attention, MLP |
| Training data | Up to roughly 18 trillion tokens (from a ~20T-token corpus) |
| License | TII Falcon LLM license (permissive, based on Apache 2.0) |
| Availability | Open weights on Hugging Face (tiiuae) and FalconLLM.tii.ae |
The 256K-token context window (TII materials also cite a figure of 262K, reflecting the exact 262,144-token value) targets long-document processing, retrieval, and extended multi-turn dialogue, applications where the linear-time SSM component is expected to be most advantageous over quadratic attention.[1][2]
TII evaluated Falcon-H1 across a suite of 23 benchmarks covering general knowledge, reasoning, mathematics, coding, instruction following, and multilingual and long-context tasks, comparing each size against contemporary Transformer models of similar or larger scale.[1][3] As with all developer-reported results, the following figures are TII's own claims.
For the flagship, TII reports that Falcon-H1-34B-Instruct surpasses or matches 70B-scale models, naming Qwen2.5-72B and Llama 3.3-70B, and that it is competitive with or ahead of other strong models in its comparison set including Qwen3-32B, Qwen2.5-32B, Gemma3-27B, and Llama-4-Scout (17B active, 16-expert).[3][4] On long-context evaluations, TII compared Falcon-H1-34B-Instruct directly against Qwen2.5-72B-Instruct across retrieval, recall, and document question-answering tasks.[1] The institute also reported efficiency advantages for the hybrid design, citing up to a 4x speedup in input throughput and up to 8x in output throughput at longer sequence lengths relative to Qwen2.5-32B.[1]
For smaller members, TII states that Falcon-H1-1.5B-Deep performs on par with leading 7B to 10B Transformer models, and that Falcon-H1-0.5B reaches the level of typical 2024-era 7B models, illustrating the family's "punch above its weight" positioning.[1]
The reasoning-tuned successor, Falcon-H1R 7B, reported strong scores on reasoning-heavy benchmarks in January 2026, including about 83.1% on AIME 2025 (competition mathematics) and about 68.6% on LiveCodeBench v6 (coding), while sustaining high throughput; these figures pertain to Falcon-H1R rather than the original Falcon-H1 release.[6]
Falcon-H1 is released under the TII Falcon LLM license, a permissive open-weight license that TII describes as based on Apache 2.0 and intended to encourage responsible use.[1][2] Open weights for every size are distributed on Hugging Face and via TII's Falcon LLM site, enabling local deployment, fine-tuning, and research.
Falcon-H1 is significant as one of the most prominent open hybrid attention-SSM language model families to emerge from the Gulf region, and as a demonstration that parallel hybrid-head architectures can scale to the tens-of-billions-of-parameters range while claiming parity with much larger pure-Transformer models. It sits within a broader 2024 to 2025 wave of hybrid SSM-Transformer designs that includes AI21 Labs' Jamba, NVIDIA's Hymba and Nemotron-H, and Zyphra's Zamba, each combining attention with Mamba-style state space model components in different ways.[3] Within TII's own lineup, Falcon-H1 unifies the attention-based Falcon and Falcon 3 lines with the pure-SSM Falcon Mamba, and provides the architectural foundation later extended by the reasoning-focused Falcon-H1R.[1][6]