IBM Granite

AI Models Large Language Models Open Source AI

13 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

18 citations

Revision

v2 · 2,610 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

IBM Granite is a family of open foundation models built by IBM for enterprise use, released under the permissive Apache 2.0 license and delivered through IBM's watsonx platform. The family spans general purpose large language models, code models, safety classifiers, speech recognition, document conversion, text embeddings, and time series forecasting, with model sizes ranging from roughly 350 million parameters up to a 32B mixture of experts. Granite emphasizes enterprise trust: IBM discloses its training data sources, ships IP indemnification through watsonx, and, with the Granite 4.0 generation released on 2 October 2025, made it the only open language model family to earn ISO 42001 certification. ^[1]^[2]^[3]

Granite is IBM's answer to a specific question that large companies kept asking: can you get a model that is open enough to inspect and self-host, small enough to run cheaply, and documented enough to pass a compliance review. The family is best known for two things. The first is IBM's heavy emphasis on transparency and governance, including disclosure of training data sources and, with the 4.0 generation, the ISO 42001 certification and cryptographic signing of model checkpoints. The second is the 4.0 release, which moved most of the family onto a hybrid architecture that mixes Mamba state space layers with a small number of transformer attention layers to cut memory use sharply. ^[3]^[4]

What is IBM Granite?

Granite started inside IBM's watsonx push in 2023. The brand covered a set of proprietary enterprise models trained on business, legal, and technical text, and the first public Granite paper appeared in September 2023. ^[5] Over 2024 and 2025 the focus shifted toward open weights, and the lineup grew into a broad toolkit rather than a single chatbot model.

IBM positions Granite for work like summarization, classification and extraction, question answering over private documents, code generation, function calling, and multilingual dialog. The models target English plus a set of other languages including German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. ^[6] The pitch is not that any single Granite model beats the largest frontier systems on every leaderboard. It is that a company can deploy a small, well documented, openly licensed model on its own hardware, fine tune it on internal data, and know where the training data came from. That combination matters more in a bank or a hospital than a few extra points on a public benchmark.

Is IBM Granite open source?

Granite weights are released under the Apache 2.0 license, which permits commercial use, modification, and redistribution. IBM frames this as a deliberate stance against restrictive licensing: it describes the choice as "bucking the recent trend of closed models or open-weight models released under idiosyncratic proprietary licensing agreements." ^[1] IBM also publishes technical reports and lists the data sources used for training, which is more than many other open weight releases offer.

The openness is real but worth stating precisely. The full training datasets and the training code are not released in their entirety, so Granite is open weight with strong documentation rather than fully open source in the strictest sense. ^[2]^[7] What sets it apart on the trust side is governance: IBM says Granite is the first major open-source AI model family to earn ISO 42001 certification, the world's first international standard for accountability, explainability, data privacy, and reliability in AI management systems, awarded after a months-long external audit by the certification body Schellman. ^[3]^[16]

What are the IBM Granite model generations?

The open Granite line came together quickly across 2024 and 2025.

Granite Code arrived first, in May 2024. IBM released base and instruction tuned code models at 3B, 8B, 20B, and 34B parameters under Apache 2.0. These are decoder only models trained on code from 116 programming languages, aimed at code generation, explanation, and repair. The smaller models saw roughly four trillion tokens of code data in a first phase, followed by a second phase that mixed in technical, math, and web text. ^[8]^[9]

Granite 3.0 followed in October 2024 as the first broad general purpose open release. It included dense base and instruct models at 2B and 8B, two mixture of experts models for low latency serving, and the first Granite Guardian safety models. The dense models launched with a 4K context window. IBM reported training on a large multi trillion token corpus and published the data documentation alongside the weights. ^[1]

Granite 3.1 came in December 2024. Its main change was context length, extended from 4K to 128K tokens through a progressive training recipe, along with general quality improvements and an embedding model release. ^[10]

Granite 3.2, released on 26 February 2025, added optional chain of thought reasoning that developers can switch on or off, so the model only spends extra compute when a task needs it. It also brought Granite Vision 3.2 2B, a compact vision language model aimed at document understanding such as tables and charts. ^[11]

Granite 3.3, released on 16 April 2025, introduced Granite Speech 3.3 8B, IBM's first official speech to text model with translation, plus fill in the middle support for code and a set of retrieval focused LoRA adapters. ^[12]

Granite 4.0, released on 2 October 2025, is the headline generation and is covered in its own section below. A follow on 4.1 release later refined the lineup with dense models at 3B, 8B, and 30B and updated speech, vision, guardian, and embedding components. IBM reported that the 4.1 8B instruct model "consistently matches or outperforms the Granite 4.0 32B Mixture-of-Experts model" while using a simpler architecture, and that the family trained on roughly 15 trillion tokens with context extended toward 512K. ^[3]

The timeline below summarizes the open releases.

Generation	Released	Headline change
Granite Code	May 2024	First open release: code models 3B-34B, Apache 2.0
Granite 3.0	October 2024	First broad general purpose open models (2B, 8B dense; MoE); Granite Guardian
Granite 3.1	December 2024	Context extended from 4K to 128K tokens
Granite 3.2	26 February 2025	Toggleable chain of thought reasoning; Granite Vision
Granite 3.3	16 April 2025	Granite Speech 3.3 8B; fill in the middle; RAG LoRAs
Granite 4.0	2 October 2025	Hybrid Mamba 2 / transformer architecture; ISO 42001
Granite 4.1	Late 2025	Dense 3B/8B/30B; ~15T tokens; context toward 512K

How does the Granite 4.0 hybrid architecture work?

Granite 4.0 is where the family changed shape. Most Granite 4.0 models drop the pure transformer design in favor of a hybrid that interleaves Mamba 2 state space layers with a small number of standard attention layers. In the H-Small model card, the stack runs 36 Mamba 2 layers to 4 attention layers, which is close to the 9 to 1 ratio IBM describes as canonical for the hybrid design. The models also drop explicit positional encodings, a setup IBM labels NoPE. ^[4]^[13]

The motivation is memory. As IBM puts it, "many enterprise use cases, especially those involving large-scale deployment, agentic AI in complex environments, or RAG systems, entail lengthy context, batch inferencing of several concurrent model instances at once, or both." ^[3] A transformer's attention cost grows with the square of the sequence length, so doubling the context roughly quadruples the work and the memory tied up in the key value cache. Mamba layers scale linearly instead, so doubling the context only doubles their cost. By making most layers Mamba and keeping a few attention layers for the precision that attention is good at, Granite 4.0-H can cut the RAM needed for long inputs and many concurrent requests by more than 70 percent compared with a conventional transformer of similar quality. That translates into running the same workload on cheaper GPUs, which is the entire point for IBM's enterprise buyers. ^[3]^[4]^[14]

The 4.0 lineup ships in several sizes. The hybrid mixture of experts models are H-Small at 32B total with about 9B active and H-Tiny at 7B total with about 1B active. H-Micro is a 3B hybrid dense model. IBM also shipped a conventional transformer Micro at 3B for developers who want the familiar architecture, plus a Nano series at roughly 350M and 1B that is small enough to run in a web browser or on edge hardware. The models carry a 128K context window in deployment and were trained on samples up to 512K tokens. Training ran on an NVIDIA GB200 cluster, and all of it is Apache 2.0. ^[13]^[14]^[15]

Granite 4.0 also leaned hard into governance. IBM says it is the first open model family to earn ISO 42001 certification, the international standard for AI management systems covering accountability, data privacy, and reliability. IBM cryptographically signs every 4.0 checkpoint on Hugging Face so that users can verify provenance, and it ran a bug bounty program with HackerOne offering up to 100,000 dollars for successful jailbreaks of guarded Granite deployments. ^[3]^[16]^[17]

What can Granite models do? Family members

Granite is a toolkit, and the specialized members are part of why IBM pitches it for end to end enterprise work.

Granite Code covers code generation, explanation, repair, and fill in the middle completion across many programming languages. ^[8]

Granite Guardian is a set of safety classifiers used as input and output guardrails. They flag categories such as harmful content, jailbreak attempts, and, in RAG settings, groundedness and relevance of retrieved context. They are meant to sit around a generation model rather than replace its own alignment. ^[1]

Granite Speech provides speech to text with translation. The 3.3 8B model can process longer audio than some fixed window systems, and IBM later reported a 5.33 percent word error rate for a 4.1 speech variant. ^[12]^[3]

Granite Docling is a compact 258M parameter vision language model for converting documents into machine readable form while preserving layout, tables, equations, and lists. IBM says its quality rivals systems several times its size, and it is released under Apache 2.0. ^[18]

Granite Embedding supplies text embedding models for semantic search and RAG, with later multilingual versions covering more than 200 languages. ^[3]^[10]

Granite Time Series provides compact models for forecasting on enterprise time series data, a domain most language model families ignore. ^[11]

The table below summarizes the main families and representative sizes.

Family	Type	Representative sizes	License
Granite 4.0 language (H)	Hybrid Mamba 2 / transformer, dense and MoE	H-Small 32B (9B active), H-Tiny 7B (1B active), H-Micro 3B	Apache 2.0
Granite 4.0 Micro / Nano	Conventional transformer	Micro 3B, Nano ~350M and ~1B	Apache 2.0
Granite 3.x language	Transformer, dense and MoE	2B, 8B dense; 1B and 3B MoE	Apache 2.0
Granite Code	Code, decoder only	3B, 8B, 20B, 34B	Apache 2.0
Granite Guardian	Safety classifier	2B, 8B	Apache 2.0
Granite Speech	Speech to text and translation	~2B, 8B	Apache 2.0
Granite Vision	Document focused VLM	~2B	Apache 2.0
Granite Docling	Document conversion VLM	258M	Apache 2.0
Granite Embedding	Text embeddings	small, multilingual	Apache 2.0
Granite Time Series	Forecasting	compact	Apache 2.0

How does Granite perform on benchmarks?

IBM publishes evaluation tables on the Hugging Face model cards. The figures below come from the Granite 4.0 H-Small card and cover instruction following, reasoning, math, code, and tool use. They are self reported, so treat them as IBM's measurements rather than independent results.

Benchmark	Setting	Granite 4.0 H-Small
MMLU	5-shot	78.44
MMLU-Pro	5-shot, CoT	55.47
BBH	3-shot, CoT	81.62
GPQA	0-shot, CoT	40.63
IFEval	average, strict	87.55
AlpacaEval 2.0	-	42.48
GSM8K	8-shot	87.27
HumanEval	pass@1	88
MBPP	pass@1	84
BFCL v3	tool calling	64.69
SALAD-Bench	safety	97.3

On instruction following, IBM reported that H-Small trailed only Meta's much larger Llama 4 Maverick among open models on IFEval at launch, and that on the Berkeley Function Calling Leaderboard it kept pace with larger systems at a lower serving cost. IBM also noted that the smallest 4.0 models outperformed the previous generation Granite 3.3 8B despite being less than half its size. The recurring theme across these numbers is capability per dollar rather than a top of the chart score. ^[3]^[14]^[15]

Why does Granite matter for enterprise open models?

Granite sits in a specific spot in the open model landscape. It does not chase the largest possible model or the top of general leaderboards. It competes on small footprints, permissive licensing, and a paper trail that a risk team can actually read. IBM frames the 4.1 generation around exactly this trade: it says Granite "delivers competitive instruction-following and tool-calling performance without relying on long chains of thought, offering predictable latency, stable token usage, and lower operational cost." ^[3]

The 4.0 generation pushed the positioning further on two fronts at once. The hybrid Mamba and transformer design attacked the memory cost that makes long context and high concurrency expensive, and the ISO 42001 certification with checkpoint signing addressed the governance questions that slow enterprise adoption. ^[3]^[4] That positioning has earned Granite comparisons to efficient open families from other vendors, with some coverage casting Granite 4.0 as a Western counterpart to the small, efficient open models coming out of China. ^[14] Whether Granite wins broad developer mindshare is a separate matter from whether it succeeds inside IBM's accounts, where the model is bundled with watsonx tooling, partner cloud availability, and IBM's support contracts. For organizations that need an openly licensed model they can host, audit, and certify, Granite is one of the few families built from the start around those requirements. ^[1]^[2]

References

IBM. "IBM Granite 3.0: open, state-of-the-art enterprise models." IBM, October 2024. https://www.ibm.com/new/announcements/ibm-granite-3-0-open-state-of-the-art-enterprise-models ↩
Red Hat. "What are Granite models?" Red Hat. https://www.redhat.com/en/topics/ai/what-are-granite-models ↩
IBM Research. "Introducing the IBM Granite 4.1 family of models." IBM Research blog. https://research.ibm.com/blog/granite-4-1-ai-foundation-models ↩
MarkTechPost. "IBM Released new Granite 4.0 Models with a Novel Hybrid Mamba-2/Transformer Architecture." 2 October 2025. https://www.marktechpost.com/2025/10/02/ibm-released-new-granite-4-0-models-with-a-novel-hybrid-mamba-2-transformer-architecture-drastically-reducing-memory-use-without-sacrificing-performance/ ↩
Wikipedia. "IBM Granite." https://en.wikipedia.org/wiki/IBM_Granite ↩
Hugging Face. "ibm-granite/granite-4.0-h-small model card." https://huggingface.co/ibm-granite/granite-4.0-h-small ↩
IBM. "Granite." IBM product page. https://www.ibm.com/granite ↩
IBM Research. "IBM's Granite code model family is going open source." IBM Research blog, May 2024. https://research.ibm.com/blog/granite-code-models-open-source ↩
GitHub. "ibm-granite/granite-code-models." https://github.com/ibm-granite/granite-code-models ↩
IBM. "IBM Granite 3.1: powerful performance, longer context and more." IBM, December 2024. https://www.ibm.com/new/announcements/ibm-granite-3-1-powerful-performance-long-context-and-more ↩
IBM. "IBM Granite 3.2: open source reasoning and vision." IBM, 26 February 2025. https://www.ibm.com/new/announcements/ibm-granite-3-2-open-source-reasoning-and-vision ↩
IBM. "IBM Granite 3.3: Speech recognition, refined reasoning, and RAG LoRAs." IBM, 16 April 2025. https://www.ibm.com/new/announcements/ibm-granite-3-3-speech-recognition-refined-reasoning-rag-loras ↩
VentureBeat. "Western Qwen: IBM wows with Granite 4 LLM launch and hybrid Mamba/Transformer architecture." 2 October 2025. https://venturebeat.com/ai/western-qwen-ibm-wows-with-granite-4-llm-launch-and-hybrid-mamba-transformer ↩
InfoWorld. "IBM launches Granite 4.0 to cut AI infra costs with hybrid Mamba-transformer models." https://www.infoworld.com/article/4067691/ibm-launches-granite-4-0-to-cut-ai-infra-costs-with-hybrid-mamba-transformer-models.html ↩
IBM. "Granite 4.0 bets big on small models." IBM Think. https://www.ibm.com/think/news/granite-4-bets-big-on-small-models ↩
IBM. "IBM becomes first major open-source AI model developer to earn ISO 42001 certification." IBM. https://www.ibm.com/new/announcements/ibm-granite-iso-42001 ↩
IBM Research. "IBM further strengthens Granite for enterprise deployment with HackerOne." IBM Research blog. https://research.ibm.com/blog/granite-hackerone-bug-bounty ↩
IBM. "IBM Granite-Docling: End-to-end document understanding." https://www.ibm.com/new/announcements/granite-docling-end-to-end-document-conversion ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

IBM Granite 4.0 IBM watsonx Jamba Reasoning 3B Neuromorphic computing

What is IBM Granite?

Is IBM Granite open source?

What are the IBM Granite model generations?

How does the Granite 4.0 hybrid architecture work?

What can Granite models do? Family members

How does Granite perform on benchmarks?

Why does Granite matter for enterprise open models?

References

Improve this article

Related Articles

Llama 3

OLMo

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

What links here

Related Articles

Llama 3

OLMo

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

What links here