# Falcon-H1

> Source: https://aiwiki.ai/wiki/falcon_h1
> Updated: 2026-06-28
> Categories: AI Models, Large Language Models, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Falcon-H1** is a family of open-weight large language models released in 2025 by the [Technology Innovation Institute](/wiki/technology_innovation_institute) (TII), the applied-research arm of Abu Dhabi's Advanced Technology Research Council in the United Arab Emirates. Falcon-H1 uses a parallel hybrid "hybrid-head" architecture that runs classical Transformer self-attention and [Mamba](/wiki/mamba)-2 [state space model](/wiki/state_space_model) (SSM) heads concurrently inside every block, combining the recall and in-context learning strengths of attention with the linear-time efficiency and long-context scaling of SSMs.[1][2] The family spans six sizes from 0.5 billion to 34 billion parameters, each offered in base and instruction-tuned variants, supports a context window of up to about 256,000 tokens, and natively handles 18 languages. TII positions Falcon-H1 as a parameter-efficient line in which each model is designed to match or exceed conventional Transformer models at least twice its size, with the flagship Falcon-H1-34B reported to rival 70B-class models such as Qwen2.5-72B and Llama 3.3-70B.[1][3]

## What is Falcon-H1?

Falcon-H1 was announced on 20 May 2025 as the first hybrid attention-SSM entry in TII's Falcon series.[2][4] It was released with open weights on Hugging Face under the `tiiuae` organization and on TII's Falcon LLM site, alongside base, instruction-tuned, and quantized checkpoints, more than 30 in total.[3] A detailed technical report, titled "Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance," followed and was posted to arXiv on 31 July 2025 (arXiv:2507.22448); its abstract opens by stating that Falcon-H1 is "a new series of large language models (LLMs) featuring hybrid architecture designs optimized for both high performance and efficiency across diverse use cases."[3][5]

The central design goal of Falcon-H1 is parameter efficiency. According to TII, the smallest member, Falcon-H1-0.5B, delivers quality comparable to typical 7B Transformer models from 2024, while the Falcon-H1-1.5B-Deep variant is claimed to rival many leading 7B to 10B models. More broadly, the developers state that each model is engineered to match or surpass models at least twice its size.[1] These are vendor claims tied to TII's own evaluation suite and should be read as such.

The line later gained a reasoning-focused extension. In January 2026, TII released Falcon-H1R 7B, a reasoning-tuned 7B hybrid model that the institute said could out-reason models up to seven times its size.[6] Falcon-H1R builds on the same hybrid-head foundation introduced by Falcon-H1.

## Who built Falcon-H1?

The Technology Innovation Institute is a government-backed research organization based in Abu Dhabi, established in 2020 under the emirate's Advanced Technology Research Council. TII's AI and digital science teams develop the open Falcon series of language models, which began with the 2023 release of [Falcon](/wiki/falcon) 7B, 40B, and 180B Transformer models that were among the most capable open-weight LLMs of their time.[2] TII subsequently shipped [Falcon 3](/wiki/falcon_3), a 2024 generation of efficient Transformer models, and Falcon Mamba 7B, a pure SSM model that contained no attention layers. Falcon-H1 represents a convergence of these two lines, pairing the attention mechanism of the original Falcon and Falcon 3 with the Mamba-style state-space modeling explored in Falcon Mamba.[1][2]

## What is the hybrid attention-Mamba architecture?

Falcon-H1 uses a parallel hybrid, sometimes called a hybrid-head, mixer block. Within each block the self-attention module and a Mamba-2 SSM module operate concurrently on the same input, and their outputs are concatenated before being passed through the block's output projection.[1][3] The Falcon team summarizes the core idea directly: "We combine attention and Mamba-2 heads in parallel within our hybrid mixer block."[1] This contrasts with sequential hybrids such as [Jamba](/wiki/jamba), which interleave whole Transformer layers and Mamba layers in a stack, and is closely related to the hybrid-head idea introduced by NVIDIA's [Hymba](/wiki/hymba), which also fuses attention and SSM heads inside a layer.

A defining property of the parallel design is that the proportion of attention channels versus SSM channels can be tuned independently of the layer count. TII reports that a relatively small fraction of attention is sufficient for strong performance, and that increasing the share of attention channels tended to degrade results in their ablations, whereas balancing the SSM and feed-forward (MLP) allocations produced robust gains.[3] The default Falcon-H1 configuration allocates channels to the SSM, attention, and MLP components in roughly a 2:1:5 ratio, reflecting that the three component types scale their parameter counts differently with model width.[3]

Other notable architecture and training choices reported in the technical report include:

- Use of Maximal Update Parametrization (muP) to make hyperparameters transfer more predictably across model sizes.[3][4]
- A 1.5B-Deep variant that trades width for depth, using 66 layers to extract more capability from a small parameter budget.[3]
- Training on a curated corpus of high-quality web data, multilingual text, code spanning many programming languages, and mathematics, with the larger models trained on up to roughly 18 trillion tokens drawn from a corpus on the order of 20 trillion tokens.[3]

The hybrid architecture has drawn third-party engineering attention. NVIDIA published a technical guide on implementing the Falcon-H1 hybrid architecture in its Megatron-Core training framework, indicating interest in the parallel attention-SSM block beyond TII itself.[7]

## What models are in the Falcon-H1 family?

The family comprises six model scales, each released in both a base (pretrained) and an instruction-tuned variant. The sizes are 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B parameters.[1][2] All models share the parallel hybrid-head architecture and the long-context capability; the 1.5B-Deep model differs from the standard 1.5B model in being deeper rather than wider.

The table below summarizes the family as described by TII. Benchmark-derived efficiency claims are TII's own and are attributed accordingly.

| Attribute | Detail |
| --- | --- |
| Developer | Technology Innovation Institute (TII), Abu Dhabi, UAE |
| Announced | 20 May 2025 |
| Technical report | arXiv:2507.22448 (31 July 2025) |
| Architecture | Parallel hybrid: Transformer attention plus Mamba-2 SSM heads in the same block |
| Model sizes | 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, 34B parameters |
| Variants | Base and instruction-tuned (plus quantized releases; 30+ checkpoints) |
| Context window | Up to about 256K tokens (reported as up to 262K) |
| Languages | 18 supported natively, with stated scalability to 100+ |
| Channel ratio (default) | Approximately 2:1:5 for SSM, attention, MLP |
| Training data | Up to roughly 18 trillion tokens (from a ~20T-token corpus) |
| License | TII Falcon LLM license (permissive, based on Apache 2.0) |
| Availability | Open weights on Hugging Face (`tiiuae`) and FalconLLM.tii.ae; also Amazon Bedrock Marketplace and SageMaker JumpStart |

The 256K-token context window (TII materials also cite a figure of 262K, reflecting the exact 262,144-token value) targets long-document processing, retrieval, and extended multi-turn dialogue, applications where the linear-time SSM component is expected to be most advantageous over quadratic attention.[1][2]

## What languages does Falcon-H1 support?

Falcon-H1 natively supports 18 languages: Arabic, Czech, German, English, Spanish, French, Hindi, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Romanian, Russian, Swedish, Urdu, and Chinese. TII states that the architecture and tokenizer are designed to scale to more than 100 languages.[2][8] The native Arabic coverage is a particular emphasis for TII as a Gulf-based developer; in a later release the institute also shipped Arabic-focused Falcon-H1 configurations at the 3B, 7B, and 34B scales.[2]

## How does Falcon-H1 perform?

TII evaluated Falcon-H1 across a suite of 23 benchmarks covering general knowledge, reasoning, mathematics, coding, instruction following, and multilingual and long-context tasks, comparing each size against contemporary Transformer models of similar or larger scale.[1][3] As with all developer-reported results, the following figures are TII's own claims.

For the flagship, TII reports that Falcon-H1-34B-Instruct surpasses or matches 70B-scale models, naming Qwen2.5-72B and Llama 3.3-70B, and that it is competitive with or ahead of other strong models in its comparison set including Qwen3-32B, Qwen2.5-32B, Gemma3-27B, and Llama-4-Scout (17B active, 16-expert).[3][4] On long-context evaluations, TII compared Falcon-H1-34B-Instruct directly against Qwen2.5-72B-Instruct across retrieval, recall, and document question-answering tasks.[1] The institute also reported efficiency advantages for the hybrid design, citing up to a 4x speedup in input throughput and up to 8x in output throughput at longer sequence lengths relative to Qwen2.5-32B.[1]

For smaller members, TII states that Falcon-H1-1.5B-Deep performs on par with leading 7B to 10B Transformer models, and that Falcon-H1-0.5B reaches the level of typical 2024-era 7B models, illustrating the family's "punch above its weight" positioning.[1]

The reasoning-tuned successor, Falcon-H1R 7B, reported strong scores on reasoning-heavy benchmarks in January 2026, including about 83.1% on AIME 2025 (competition mathematics) and about 68.6% on LiveCodeBench v6 (coding), while sustaining high throughput; these figures pertain to Falcon-H1R rather than the original Falcon-H1 release.[6]

## Is Falcon-H1 open source, and where can you run it?

Falcon-H1 is released under the TII Falcon LLM license, a permissive open-weight license that TII describes as based on Apache 2.0 and intended to encourage responsible use.[1][2] Open weights for every size are distributed on Hugging Face and via TII's Falcon LLM site, enabling local deployment, fine-tuning, and research. The instruction-tuned models are also offered as managed endpoints on Amazon Bedrock Marketplace and for deployment through Amazon SageMaker JumpStart, broadening access for cloud users.[8]

Falcon-H1 is significant as one of the most prominent open hybrid attention-SSM language model families to emerge from the Gulf region, and as a demonstration that parallel hybrid-head architectures can scale to the tens-of-billions-of-parameters range while claiming parity with much larger pure-Transformer models. It sits within a broader 2024 to 2025 wave of hybrid SSM-Transformer designs that includes AI21 Labs' [Jamba](/wiki/jamba), NVIDIA's [Hymba](/wiki/hymba) and Nemotron-H, and Zyphra's Zamba, each combining attention with [Mamba](/wiki/mamba)-style [state space model](/wiki/state_space_model) components in different ways.[3] Within TII's own lineup, Falcon-H1 unifies the attention-based [Falcon](/wiki/falcon) and [Falcon 3](/wiki/falcon_3) lines with the pure-SSM Falcon Mamba, and provides the architectural foundation later extended by the reasoning-focused Falcon-H1R.[1][6]

## References

1. Falcon LLM Team. "Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance." Falcon blog, 20 May 2025. https://falcon-lm.github.io/blog/falcon-h1/
2. Technology Innovation Institute. "Falcon H1." FalconLLM.tii.ae. https://falconllm.tii.ae/falcon-h1.html
3. Falcon LLM Team. "Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance." arXiv:2507.22448, 31 July 2025. https://arxiv.org/abs/2507.22448
4. "Technology Innovation Institute (TII) Releases Falcon-H1: Hybrid Transformer-SSM Language Models for Scalable, Multilingual, and Long-Context Understanding." MarkTechPost, 21 May 2025. https://www.marktechpost.com/2025/05/21/technology-innovation-institute-tii-releases-falcon-h1-hybrid-transformer-ssm-language-models-for-scalable-multilingual-and-long-context-understanding/
5. "Falcon LLM Team Releases Falcon-H1 Technical Report: A Hybrid Attention-SSM Model That Rivals 70B LLMs." MarkTechPost, 1 August 2025. https://www.marktechpost.com/2025/08/01/falcon-llm-team-releases-falcon-h1-technical-report-a-hybrid-attention-ssm-model-that-rivals-70b-llms/
6. "TII's Falcon H1R 7B can out-reason models up to 7x its size, and it's (mostly) open." VentureBeat, January 2026. https://venturebeat.com/technology/tiis-falcon-h1r-7b-can-out-reason-models-up-to-7x-its-size-and-its-mostly
7. NVIDIA. "Implementing Falcon-H1 Hybrid Architecture in NVIDIA Megatron Core." NVIDIA Technical Blog. https://developer.nvidia.com/blog/implementing-falcon-h1-hybrid-architecture-in-nvidia-megatron-core/
8. Amazon Web Services. "TII Falcon-H1 models now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart." AWS Machine Learning Blog, 2025. https://aws.amazon.com/blogs/machine-learning/tii-falcon-h1-models-now-available-on-amazon-bedrock-marketplace-and-amazon-sagemaker-jumpstart/