Jamba2

AI Models Large Language Models Mixture of Experts Model Architecture Open Source AI

17 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

16 citations

Revision

v3 · 3,499 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Jamba2 is the second generation of hybrid State Space Model and Transformer language models released by AI21 Labs on January 8, 2026.^[1] The family extends the Jamba architecture introduced in March 2024, combining Mamba selective state space layers with Transformer attention layers and, in the larger variant, Mixture of Experts routing.^[5] Two models were released at launch: a dense 3B parameter model intended for on device use and a Mini Mixture of Experts model with roughly 12 billion active parameters out of 52 billion total. Both ship under the Apache 2.0 licence with a 256K token context window.^[1]

AI21 positions Jamba2 as an open source family optimised for enterprise reliability and steerability rather than as a frontier reasoning model. The release blog frames the design goal as a model that is faithful to provided context, follows instructions reliably, and maintains high throughput as the input window scales toward 100K tokens, all without the cost overhead of an extended reasoning pipeline.^[1] Headline claims at launch include category leading scores on the IFBench, IFEval, and Collie instruction following benchmarks and on the FACTS grounding benchmark, along with statistically significant human evaluation wins against Mistral's Ministral 3 14B on a set of 100 real world enterprise prompts.^[1]

Background

AI21 Labs was founded in Tel Aviv in November 2017 by Yoav Shoham, Ori Goshen, and Amnon Shashua.^[12]^[13] The company built its early reputation on the Wordtune writing assistant and the Jurassic family of dense Transformer language models, which were among the first commercially deployed alternatives to OpenAI's GPT models.^[13] AI21 became a unicorn in 2023 and has raised more than 600 million dollars across multiple rounds.^[13] The company's commercial focus has been on enterprise natural language processing rather than consumer products, with most production deployments routed through the AI21 Studio platform and, since September 2024, through Amazon Bedrock.^[11]

The Jamba lineage began in March 2024 with the release of the original Jamba model, a 52B total parameter Mixture of Experts model with 12B active parameters.^[5]^[14] Jamba was the first large scale language model to interleave State Space Model layers with Transformer attention blocks,^[15] and at the time it shipped with what AI21 described as the longest open context window available, supporting up to 256K tokens.^[5] The architecture answered a specific problem in long context serving: the quadratic cost of self attention with respect to sequence length had made very long contexts expensive on pure Transformer backbones, while pure Mamba models tended to underperform on recall heavy tasks. Interleaving the two block types gave Jamba most of the throughput advantage of an SSM with most of the recall accuracy of attention.^[8]

A Jamba 1.5 update followed in August 2024 with two variants, Jamba 1.5 Mini at the original 52B total parameter scale and Jamba 1.5 Large at roughly 398B total and 94B active parameters.^[6] The 1.5 release introduced an instruction tuned chat variant and was distributed under the Jamba Open Model License, a custom community licence with a commercial use threshold.^[6] Jamba2 reverses that licensing decision and ships under Apache 2.0 across the board, reflecting a broader 2025 shift among open weight Transformer-Mamba hybrids toward permissive licensing.^[1]

The gap between Jamba 1.5 and Jamba2 also covers the period in which other hybrid and pure SSM architectures gained traction. Mamba 2, the successor to the original Mamba paper from Tri Dao and Albert Gu, arrived in mid 2024 with a more efficient state space duality formulation. The RWKV project shipped successive generations through RWKV-6 and RWKV-7, the latter introducing a generalised delta rule. Mistral, Microsoft, Nvidia, and several Chinese labs released their own hybrid Transformer-Mamba experiments in 2025. AI21 positions Jamba2 as the product of these years of accumulated hybrid architecture research, rather than as a radically new design.^[9]

Jamba2 family

The Jamba2 family at launch contains two parameter scales. The 3B model is dense, intended to run on consumer hardware including iPhones, Android phones, Macs, and Windows PCs through quantised builds.^[1] The Mini model is a Mixture of Experts model carried over from the Jamba 1.5 parameter count. Both models share the same context length, training pipeline, and licence.

Variant	Parameters	Active parameters	Context window	Released	License	HF repository
Jamba2 3B	3 billion	3 billion (dense)	256K tokens	January 8, 2026	Apache 2.0	`ai21labs/AI21-Jamba2-3B`
Jamba2 Mini	52 billion total	12 billion	256K tokens	January 8, 2026	Apache 2.0	`ai21labs/AI21-Jamba2-Mini`
Jamba2 Mini FP8	52 billion total	12 billion	256K tokens	January 8, 2026	Apache 2.0	`ai21labs/AI21-Jamba2-Mini-FP8`

The FP8 build of Jamba2 Mini is a post training quantised version intended for vLLM and similar high throughput serving stacks. It is not a separate model in the architectural sense, just a precision conversion of the base Mini checkpoint. AI21 distributes both the base Mini and the FP8 build through the same Hugging Face collection.^[4]

At the time of the January 8 release, AI21 did not publish a Jamba2 Large model. The Jamba 1.5 Large remained the largest publicly available checkpoint in the Jamba family. The release blog does not commit to a specific timeline for a Jamba2 Large, although the team has indicated that the architectural work for scaling Jamba beyond 100B active parameters is ongoing.^[1]

Architecture

Jamba2 inherits the basic block layout of the original Jamba architecture. The model interleaves three component types in a fixed pattern: Mamba selective state space layers, Transformer attention layers, and feed forward layers (which in the Mini variant are replaced with Mixture of Experts routing). Each Jamba block consists of either an attention or a Mamba layer followed by a feed forward or MoE component. The published Jamba 1 and 1.5 papers describe the interleaving ratio as one Transformer attention layer for every seven Mamba layers, with MoE routing applied every two blocks in the Mini scale.^[8]^[16] AI21 has not published a separate technical report for Jamba2 and the team describes the architecture in the model cards as the same Mamba-Transformer hybrid established by the earlier generations.^[2]^[7]

The attention layers use grouped query attention with low rank adaptation friendly projections, a choice that keeps the key value cache compact for long context serving. The Mamba layers retain the selective scan mechanism from the original Mamba paper, with state passing optimisations introduced in a dedicated training stage so that the recurrent state generalises cleanly to context lengths well past the training horizon. In the Mini variant, 16 experts handle routing with a small number of active experts per token, allowing the model to draw on the full 52 billion parameter pool while keeping active compute at around 12 billion parameters per forward pass.^[2]

The 256K token context window is supported natively. Unlike position interpolation approaches that extend a shorter trained window through post hoc rescaling, the Jamba architecture combines the unbounded recurrent state of the Mamba layers with the local windowed attention of the Transformer layers, which together yield a model whose effective context length is governed by the SSM state rather than by an explicit position encoding ceiling. AI21 reports that the model maintains its accuracy on recall and instruction following tasks well into the 100K range, which the company frames as the operationally relevant region for enterprise question answering on documents, contracts, and codebases.^[1]^[10]

Quantisation support is broad. The Mini model has been validated on the vLLM 0.12.0 serving stack with INT8 experts quantisation, on Hugging Face Transformers, on SGLang, and through Docker images published by AI21.^[2] Community ports to llama.cpp, Ollama, LM Studio, and Jan handle the 3B model for on device use.^[3] The on device profile of the 3B variant is the most explicit user facing difference between Jamba2 and the previous Jamba generations, which did not ship a sub 10 billion parameter checkpoint.

Training

Jamba2 was built on top of the Jamba 1.5 pretraining foundation rather than trained from scratch. AI21 added a 500 billion token mid training phase using a curated mixture with elevated representation of mathematics, code, and long documents.^[2] The model cards describe the mixture as deliberately tilted toward content that the team believed would improve grounded reasoning and instruction following in enterprise question answering settings.

The mid training phase was followed by what AI21 calls a state passing optimisation stage, in which the Mamba recurrent state was trained for explicit context length generalisation, and then by a three step post training pipeline. Supervised fine tuning on instruction following and reasoning data produced an initial instructable checkpoint. Direct preference optimisation refined the model against curated pairwise preference data. A final stage of on policy reinforcement learning combined verifiable rewards (where a programmatic checker scored the output) with model based rewards (where a separate reward model scored the output), so the policy received both deterministic and learned feedback signals.^[2]

AI21 has not published the total post training token budget, the size of the supervised fine tuning dataset, or the specific reward model used for the reinforcement learning phase. The company's communication around Jamba2 emphasises the role of grounding and instruction following data in the post training mix, consistent with the enterprise reliability framing of the release. The Hugging Face cards note that the same training pipeline was used for both the 3B and the Mini variants, with the 3B model trained as a smaller scale dense version of the Mini.^[2]

Benchmark performance

AI21 reports category leading results for Jamba2 on several instruction following and grounding benchmarks at launch. The release blog highlights four headline benchmarks where the company claims a leadership position among comparably sized open models. AI21 publishes chart figures for the IFBench, IFEval, Collie, and FACTS results but does not list the exact numerical scores in the blog text or in the Hugging Face model cards.^[1] The headline qualitative result is that Jamba2 Mini leads on instruction following and grounding metrics in the category of open models in the same parameter range.

Benchmark	Category	Result claim
IFBench	Instruction following	Category leading for Jamba2 Mini and 3B
IFEval	Instruction following	Category leading for Jamba2 Mini and 3B
Collie	Grounding and steerability	Category leading for Jamba2 Mini and 3B
FACTS	Factual grounding	Category leading for Jamba2 Mini and 3B

A separate human evaluation compared Jamba2 Mini against Mistral's Ministral 3 14B on 100 real world enterprise task prompts covering question answering, instruction heavy developer prompts, summarisation, and drafting. The evaluation used pairwise human preference judgements with annotators told to weight factuality, style, constraint adherence, instruction following, and helpfulness. AI21 reports that Jamba2 Mini achieved a statistically significant win rate over Ministral 3 14B on the aggregate score and on the factuality and instruction following subscores in particular. The exact win rate percentages are not published in the release blog.^[1]

Throughput claims are tied to the long context behaviour of the SSM-Transformer hybrid. AI21 reports that Jamba2 Mini maintains high enterprise grade throughput as context scales toward 100K tokens, which the company contrasts with pure Transformer baselines whose throughput typically degrades quadratically at long inputs. The published reliability to throughput chart in the release blog shows Jamba2 Mini in the upper right quadrant relative to a set of comparable open models, although the underlying numerical values are not transcribed in the blog text.^[1]

The absence of headline mathematics and reasoning benchmark scores is consistent with AI21's positioning of Jamba2 as an enterprise reliability model rather than a frontier reasoning model. The release blog explicitly says that Jamba2 is designed for precise question answering and grounded workflows without the reasoning model overhead found in models such as DeepSeek R1, OpenAI's o3, or Anthropic's extended thinking variants of Claude.^[1] A user who needs Olympiad mathematics or long chain reasoning is steered toward a different class of model.

Licensing

Both Jamba2 variants ship under the Apache 2.0 licence.^[1] Apache 2.0 permits commercial use, modification, and redistribution provided the original copyright notice, the patent grants, and any change indications are preserved in derivative works. There is no per user threshold, no acceptable use restriction beyond standard liability disclaimers, and no commercial gate trigger of the kind found in Meta's Llama Community Licence or in the Jamba Open Model License that AI21 used for Jamba 1.5.

The choice of Apache 2.0 represents a meaningful shift in AI21's open release policy. Jamba 1.5 had been released under the Jamba Open Model License, a custom community licence that permitted research and commercial use up to a defined revenue threshold but required separate licensing above that line.^[6] Jamba2 removes the threshold entirely. AI21 frames the change as a response to enterprise procurement teams that preferred the legal predictability of a standard permissive licence over a vendor specific one.^[1] The shift also aligns AI21 with the licensing posture of the wider permissive open weight community, including Mistral, Allen AI's OLMo 3, and the OLMoE and SmolLM families released throughout 2025.

AI21 publishes a separate Responsible Use Guidelines document alongside the model cards. The document is advisory rather than contractually binding. The Hugging Face repositories include standard model card disclosures covering training data composition, evaluation results, and known limitations, but the underlying training data has not been released.^[2] In this respect Jamba2 sits in the open weights category rather than the fully open category occupied by OLMo 3 and the Dolma 3 corpus.

Comparison to peers

Jamba2 arrives in a January 2026 landscape that includes a maturing set of hybrid Mamba-Transformer designs, several pure SSM models, and the usual lineup of dense Transformer flagships. The peer set most relevant to the Mini scale includes Mistral's Ministral 3 14B (the explicit comparison target in AI21's release blog), Microsoft's Phi-4 family, the Qwen 3 series, and other small Mixture of Experts models.^[1] The peer set relevant to the 3B scale includes SmolLM 3, Phi-4 Mini, OLMo 3 7B at the next size up, and Gemma 3 4B.

Model	Parameters	Active parameters	Architecture	Context window	License
Jamba2 Mini	52B total	12B	Mamba-Transformer-MoE hybrid	256K	Apache 2.0
Jamba2 3B	3B	3B (dense)	Mamba-Transformer hybrid	256K	Apache 2.0
Jamba 1.5 Mini	52B total	12B	Mamba-Transformer-MoE hybrid	256K	Jamba Open Model License
Jamba 1.5 Large	398B total	94B	Mamba-Transformer-MoE hybrid	256K	Jamba Open Model License
Ministral 3 14B	14B	14B (dense)	Transformer	128K	Mistral Research / Commercial
Mamba 2 base	1.3B to 2.8B	dense	Pure SSM	not specified	Apache 2.0
RWKV-7	varied	dense	Recurrent (RNN style)	variable	Apache 2.0
OLMo 3 Base 7B	7B	dense	Transformer	65K	Apache 2.0
Phi-4 Mini	3.8B	dense	Transformer	16K	MIT
SmolLM 3	3B	dense	Transformer	128K	Apache 2.0

The comparison to Mamba 2 is largely architectural. Mamba 2 is the academic state space model of record as of January 2026 and supplies the SSM block design that the Jamba lineage builds on, but it has not been scaled into a general purpose chat model at the parameter count of Jamba2 Mini. RWKV-7 is the strongest contender among non Transformer recurrent designs at small to mid scale, with its delta rule formulation providing a different efficiency profile than Mamba's selective scan, although it is not a hybrid model. AI21's bet with Jamba2 is that the hybrid approach, with attention layers retained for recall heavy operations and SSM layers handling the bulk of the sequence processing, dominates either pure design at the context lengths enterprise customers actually use.

Against the Ministral 3 14B and Phi-4 peer group, Jamba2 Mini has more total parameters and an MoE architecture, but it runs at roughly the same active parameter count as a 12 to 14 billion parameter dense Transformer. The throughput advantage at long context is the main differentiator that AI21 emphasises. On knowledge intensive benchmarks like MMLU or GSM8K, where Mistral, Microsoft, and Qwen models have published strong numbers, Jamba2 does not publish results, which limits the available head to head comparison to the instruction following and grounding suites that AI21 has chosen to highlight.

Against the original Jamba and Jamba 1.5 Mini, the Jamba2 Mini occupies the same architectural footprint but inherits a new mid training mixture, a state passing optimisation stage, and the updated post training pipeline with DPO and on policy reinforcement learning. The new licence is the second visible change. The 3B variant is the first sub 10 billion parameter checkpoint in the Jamba family.

Reception

Reception of Jamba2 in the first weeks after launch followed two main threads. The first thread covered the architecture and the long context throughput claims. Independent commentary on the rise of hybrid LLMs in early 2026 placed Jamba2 alongside Nvidia's Nemotron 3, Mistral's hybrid experiments, and several Chinese open weight releases as evidence that the hybrid Mamba-Transformer approach had moved from research artefact to production category.^[9] The Hugging Face community discussions focused on the on device profile of the 3B variant and on the quantisation properties of the Mini FP8 build, both of which were seen as evidence that AI21 was targeting deployment economics as much as benchmark performance.

The second thread covered the licence change. The shift from the Jamba Open Model License to Apache 2.0 was treated by several open source AI commentators as a notable policy decision, particularly given that AI21 had previously argued for the value of a custom open model licence as a way to retain commercial leverage. Coverage suggested that the company concluded the friction with enterprise procurement and the optics of being slightly less permissive than Mistral, OLMo 3, and other peers outweighed any commercial value of retaining the bespoke licence. The decision was generally welcomed by the open source AI community.

The enterprise reliability framing of the launch attracted more measured commentary. Several reviewers noted that the absence of mathematics and reasoning benchmark scores made it hard to place Jamba2 against the frontier reasoning models that had dominated the second half of 2025, and that AI21's choice to compare only against Ministral 3 14B was a relatively narrow benchmark target. Defenders of the framing argued that enterprise customers tended to value instruction following, factual grounding, and predictable throughput at long context more than they valued Olympiad mathematics scores, and that AI21's positioning was aligned with that customer profile rather than with the academic leaderboard chase.

As of mid 2026 the Mini model had accumulated several thousand downloads on Hugging Face and the 3B model had become a common choice for on device retrieval augmented generation prototypes through the llama.cpp and Ollama distribution paths. Production deployments on Amazon Bedrock and on the AI21 Studio platform followed the same pattern that Jamba 1.5 had established, with enterprise customers in financial services, legal, and healthcare cited by AI21 as early adopters.

References

AI21 Labs. "Introducing Jamba2: The open source model family for enterprise reliability and efficiency." January 8, 2026. https://www.ai21.com/blog/introducing-jamba2/ ↩
AI21 Labs. "ai21labs/AI21-Jamba2-Mini." Hugging Face model card. https://huggingface.co/ai21labs/AI21-Jamba2-Mini ↩
AI21 Labs. "ai21labs/AI21-Jamba2-3B." Hugging Face model card. https://huggingface.co/ai21labs/AI21-Jamba2-3B ↩
AI21 Labs. "ai21labs/AI21-Jamba2-Mini-FP8." Hugging Face model card. https://huggingface.co/ai21labs/AI21-Jamba2-Mini-FP8 ↩
AI21 Labs. "Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model." March 28, 2024. https://www.ai21.com/blog/announcing-jamba/ ↩
AI21 Labs. "The Jamba 1.5 Open Model Family: The Most Powerful and Efficient Long Context Models." August 22, 2024. https://www.ai21.com/blog/announcing-jamba-model-family/ ↩
AI21 Labs. "Jamba: A Hybrid Transformer-Mamba Language Model." Research paper page. https://www.ai21.com/research/jamba-a-hybrid-transformer-mamba-language-model/ ↩
Lieber, Opher, Lenz, Barak, et al. "Jamba: A Hybrid Transformer-Mamba Language Model." arXiv preprint arXiv:2403.19887. March 28, 2024. https://arxiv.org/abs/2403.19887 ↩
AI21 Labs. "Attention was never enough: Tracing the rise of hybrid LLMs." AI21 blog. https://www.ai21.com/blog/rise-of-hybrid-llms/ ↩
AI21 Labs. "Jamba LLMs: The Best Long Context Models for Secure Enterprise Deployment." https://www.ai21.com/jamba/ ↩
AI21 Labs. "Jamba foundation models." AI21 Studio documentation. https://docs.ai21.com/docs/jamba-foundation-models ↩
AI21 Labs. "About AI21." Company about page. https://www.ai21.com/about/ ↩
Wikipedia contributors. "AI21 Labs." Wikipedia. https://en.wikipedia.org/wiki/AI21_Labs ↩
SiliconANGLE. "AI21 Labs' Jamba infuses Mamba to bring more context to transformer-based LLMs." March 28, 2024. https://siliconangle.com/2024/03/28/ai21-labs-jamba-infuses-mamba-bring-context-transformer-based-llms/ ↩
Maginative. "AI21 Labs Unveils Jamba: The First Production-Grade Mamba-Based AI Model." 2024. https://www.maginative.com/article/ai21-labs-unveils-jamba-the-first-production-grade-mamba-based-ai-model/ ↩
EngineersOfAI. "Hybrid Architectures: Jamba and Beyond." https://engineersofai.com/docs/llms/state-space-models/hybrid-architectures-jamba ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Jamba Reasoning 3B Nemotron 3

Background

Jamba2 family

Architecture

Training

Benchmark performance

Licensing

Comparison to peers

Reception

See also

References

Improve this article

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here