OLMo

AI Models Large Language Models Open Source AI

17 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

18 citations

Revision

v3 · 3,440 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

OLMo (Open Language Model) is a family of fully open large language models built by the Allen Institute for AI (Ai2) and first released on February 1, 2024.^[1] Unlike most so called "open" models that release only weights, OLMo ships the complete pretraining data (Dolma), the training and evaluation code, every intermediate checkpoint, the step by step training logs, and the post training recipes under a permissive Apache 2.0 license.^[1] Ai2 calls this distinction "truly open" rather than "open weight," and frames OLMo as a way to make the science of language modeling fully reproducible.^[1] The family scales from 1B parameters in early 2024 up to a 32B model that, as of March 2025, Ai2 described as "the first fully-open model (all data, code, weights, and details are freely available) to outperform GPT3.5-Turbo and GPT-4o mini."^[7] The third generation, OLMo 3, was announced on November 20, 2025 with 7B and 32B reasoning, instruct, and base variants trained on the Dolma 3 corpus and a 65,536 token context window.^[8]

Infobox

Field	Value
Developer	Allen Institute for AI (Ai2)
Initial release	February 1, 2024
Latest version	OLMo 3 (November 20, 2025); OLMo 3.1 (December 12, 2025)
Parameter sizes	1B, 7B, 13B, 32B (dense); 1B active / 7B total (MoE)
Type	Decoder-only transformer language model
License	Apache 2.0
Training data	Dolma (v1, v1.5, v1.7), OLMo-Mix-1124, Dolmino-Mix-1124, Dolma 3
Tokenizer	Modified BPE based on GPT-NeoX-20B
Source	github.com/allenai/OLMo
Weights	huggingface.co/allenai

When was OLMo released?

The Allen Institute for AI announced OLMo on February 1, 2024 with a paper titled "OLMo: Accelerating the Science of Language Models," led by Dirk Groeneveld and a team of 43 authors.^[1] The first release included a 1B model and four 7B variants trained on at least 2 trillion tokens of the Dolma corpus.^[1] The release was unusual because it included not only model weights but also the complete training data, training and evaluation code, intermediate checkpoints from every 1,000 training steps (more than 500 per base model), and per step training logs.^[1] This made OLMo one of the few language models that genuinely satisfied the Open Source Initiative's definition of openness when applied to AI systems.

On April 17, 2024, Ai2 released OLMo 1.7-7B (later renamed OLMo 7B 0424).^[10] The update extended the context window from 2,048 to 4,096 tokens, switched to the new Dolma 1.7 dataset (2.3 trillion tokens), and used a two stage training curriculum that decayed the learning rate to zero over a final 50 billion token "annealing" phase on a curated subset.^[10] The result was a 24 point jump in MMLU compared to the original OLMo 7B, putting the new model ahead of Llama 2 7B on MMLU and ahead of Llama 2 13B on GSM8K.^[10]

A further refresh followed on July 31, 2024 as OLMo 7B 0724, scoring 52 on MMLU and improving GSM8K and code benchmarks through additional data mixing and post training work.^[13] On September 4, 2024, Ai2 and Contextual AI released OLMoE (1B active / 7B total), the first sparse mixture of experts entry in the family.^[4]

The second generation, OLMo 2, launched on November 26, 2024 with 7B and 13B base and instruct models trained on up to 5 trillion tokens.^[6] The 32B variant arrived on March 13, 2025 and the 1B model on May 1, 2025.^[7] OLMo 3 followed on November 20, 2025 with 7B and 32B Base, Think, Instruct, and RL Zero variants, plus a 65,536 token context window, sixteen times the context length of OLMo 2.^[8] An incremental OLMo 3.1 update appeared on December 12, 2025 with extended reinforcement learning training for the 32B Think and Instruct models.^[8]

What makes OLMo "fully open" rather than open weight?

The distinguishing feature of OLMo is what the team calls "truly open" rather than merely "open weight."^[1] Most widely cited "open source" models, including Llama 2 and 3 from Meta and the Mistral base models, release weights under custom licenses without disclosing the training data, the data filtering code, or the training infrastructure. OLMo releases all of those, plus the optimizer state at intermediate checkpoints and the Weights and Biases training logs that record loss curves, gradient norms, and hyperparameter settings step by step.^[1]

With OLMo 3, Ai2 extended this idea into what it calls the "model flow," which it defines as "the full lifecycle of an LM: every stage, checkpoint, dataset, and dependency required to create and modify it."^[8] In the OLMo 3 announcement the team argues that "true openness in AI isn't just about access, it's about trust, accountability, and shared progress," and that "by exposing this complete process, the goal is to engender greater trust and enable more effective adaptation, collaboration, and innovation."^[8]

This matters for two reasons. First, it lets outside researchers reproduce the work or run scientific experiments that need the full pipeline, for example studying how data composition affects emergent capabilities or auditing the training corpus for memorized personally identifiable information. Second, it puts OLMo on the right side of the Open Source Initiative's Open Source AI Definition, which requires release of "data information" sufficient for a skilled person to recreate a substantially equivalent system. This positions OLMo as a worked example in the broader debate over open-source AI, where most models marketed as open fail that data information test.

The Apache 2.0 license applied to all OLMo artifacts is permissive enough to allow commercial use, redistribution, and derivative works, with no field of use restrictions of the kind that appear in the Llama Community License.

Model family

Model	Release	Params	Training tokens	Context	License	Notes
OLMo 1B	Feb 2024	1B	2T	2,048	Apache 2.0	Initial release
OLMo 7B	Feb 2024	7B	2.46T	2,048	Apache 2.0	Four variants on different hardware
OLMo 7B Twin 2T	Feb 2024	7B	2T	2,048	Apache 2.0	Reproducibility experiment
OLMo 7B 0424 (1.7)	Apr 2024	7B	2.05T (Dolma 1.7)	4,096	Apache 2.0	Two stage curriculum, 24 point MMLU gain
OLMo 7B 0724	Jul 2024	7B	~2.75T	4,096	Apache 2.0	MMLU 52, improved fine tunes
OLMoE 1B-7B-0924	Sep 2024	1B active / 7B total	5.1T	4,096	Apache 2.0	Mixture of experts, 64 experts, 8 active
OLMo 2 7B	Nov 2024	7B	~4T	4,096	Apache 2.0	Two stage training on OLMo-Mix and Dolmino
OLMo 2 13B	Nov 2024	13B	~5T	4,096	Apache 2.0	Outperforms Qwen 2.5 14B Instruct
OLMo 2 32B	Mar 2025	32B	6T	4,096	Apache 2.0	First fully open model to beat GPT-3.5 and GPT-4o mini
OLMo 2 1B	May 2025	1B	4T	4,096	Apache 2.0	Smallest OLMo 2; beats Gemma 3 1B and Llama 3.2 1B
OLMo 3 7B / 32B	Nov 2025	7B, 32B	~5.9T (Dolma 3 Mix)	65,536	Apache 2.0	Base, Think, Instruct, RL Zero variants
OLMo 3.1 32B	Dec 2025	32B	(extended RL)	65,536	Apache 2.0	Reasoning gains on AIME, IFBench

Each release also ships matching SFT, DPO, RM, and Instruct (RLVR) checkpoints under the same license.

Architecture

OLMo is a decoder-only transformer in the same family as GPT-2 and Llama, with several specific design choices documented in the original paper.^[1] The 7B base model has 32 layers, a hidden size of 4,096, and 32 attention heads.^[1] The 1B uses 16 layers, a hidden size of 2,048, and 16 heads.^[1] Across sizes the architecture removes all bias terms (a choice borrowed from PaLM that is said to improve training stability), uses non parametric layer normalization, applies the SwiGLU activation in the feed forward network, and replaces absolute positional embeddings with RoPE (rotary positional embeddings).^[1] The vocabulary is 50,280 tokens, drawn from a modified BPE tokenizer based on GPT-NeoX-20B that masks personally identifiable information.^[1]

OLMo 2 reworked several stability features after the original models showed loss spikes during long training runs.^[2] The team switched non parametric layer norm for RMSNorm, reordered the layer norm position relative to the residual stream, added QK-Norm to the attention computation, and added a Z-loss regularization term on the output logits.^[2] Initialization scales for activations and gradients were retuned. None of these are headline architectural innovations on their own, but the bundle made it possible to train the 13B and 32B models without instability, which earlier versions did not survive at that scale.^[2]

OLMo 3 keeps the dense decoder only design but adds long context support up to 65,536 tokens through extended position embedding scaling and continued pretraining on long documents.^[8] The Think variants add chain of thought style reasoning traces during post training rather than at the architecture level.^[8]

What data is OLMo trained on? Dolma

Dolma is Ai2's open pretraining corpus and the source of every OLMo model's pretraining data.^[3] The first public Dolma release (August 2023) contained roughly 3 trillion tokens drawn from seven sources, curated down from about 200 TB of raw text to roughly 11 TB after filtering.^[3] The dominant source is web text from Common Crawl, with code from The Stack (permissively licensed GitHub repositories), academic papers from peS2o (a Semantic Scholar derivative), books from Project Gutenberg, social media from Reddit, and encyclopedic content from English Wikipedia and Wikibooks.^[3]

Dolma v1 composition (approximate, as used for original OLMo)

Source	Documents	Tokens
Common Crawl (web)	3.4B	2,006B
The Stack (code)	210M	411B
C4 (web)	365M	174B
Reddit	377M	80B
peS2o (papers)	38.8M	57B
Project Gutenberg	0.05M	5.3B
Wikipedia / Wikibooks	6.2M	3.7B

Dolma 1.7, used for OLMo 7B 0424, expanded to 2.3 trillion tokens with additional sources including Refined Web, OpenWebMath, Stack Exchange, and StarCoder.^[10] The OLMo 2 7B and 13B models trained on OLMo-Mix-1124 (3.9T tokens) for the bulk of training, then on Dolmino-Mix-1124 (843B tokens) for a final stage that mixes 50 percent high quality web data with academic content, math, instruction style data, and other curated sources.^[2] OLMo 3 trains on Dolma 3 Mix (about 5.9T tokens) drawn from the larger 9.3 trillion token Dolma 3 corpus, with heavier representation of code, math, and scientific PDFs.^[8]

The training pipeline that produced these mixes is itself open source as the dolma toolkit on GitHub, including the language identification, quality classifiers, deduplication code, and content filters.^[17] This is unusual; most labs treat their data filtering pipeline as a trade secret.

Training infrastructure

The original OLMo 7B was trained in two parallel runs on different hardware to test reproducibility.^[1] One run used up to 256 nodes of the LUMI supercomputer in Finland, each node containing 4 AMD MI250X GPUs (128 GB memory each).^[1] The other used 27 nodes of an NVIDIA A100 (40 GB) cluster provided by MosaicML.^[1] Both runs used PyTorch's Fully Sharded Data Parallel framework with the ZeRO optimizer strategy and bfloat16 mixed precision.^[1] Batch sizes were around 4 million tokens.^[1] The fact that the two runs produced models with similar quality is one of the demonstrations the OLMo paper offers for the reproducibility claim.^[1]

Later releases moved to NVIDIA H100 hardware. OLMoE was trained on 256 H100 GPUs.^[4] OLMo 2 32B was trained on Augusta, a 160 node Google Cloud AI Hypercomputer with 8 H100 GPUs per node connected via GPUDirect-TCPXO, achieving about 1,800 tokens per second per GPU and roughly 38 percent model FLOPs utilization.^[2] Ai2 reports that "OLMo 2 32B takes only one third of the cost of training Qwen 2.5 32B while reaching similar performance."^[7]

Post training: the Tulu series

Post training (instruction tuning, preference optimization, and reinforcement learning) for OLMo is handled by the Tulu family, also from Ai2.^[5] The current standard is Tulu 3, released in November 2024 alongside the OLMo 2 base models.^[5] Tulu 3 introduced a recipe with three stages: supervised fine tuning on a curated 939,344 prompt mix (57 percent public, 43 percent synthetic with persona conditioning), Direct Preference Optimization (DPO) on a preference pair dataset, and a novel Reinforcement Learning with Verifiable Rewards (RLVR) stage for tasks like math, code, and instruction following where correctness can be checked automatically.^[5]

The RLVR stage specifically improved GSM8K, MATH, and IFEval scores for the 7B and 13B Instruct models.^[5] A 405B Tulu 3 model was released in January 2025 as a demonstration that the recipe scales.^[11] For OLMo 3 the post training corpus was renamed Dolci and split into separate mixes for SFT, DPO, and RLVR.^[8] Tulu 3 also serves as a standalone post training recipe that can be applied to other base models, and the codebase (open-instruct) is widely used by researchers who do not want to start their post training pipeline from scratch.^[18]

How does OLMo perform on benchmarks?

Reported benchmark numbers from the OLMo 2 7B model card give a sense of how the family performs on standard academic evaluations.^[12] These are zero or few shot scores on the base model:

Benchmark	OLMo 2 7B	OLMo 2 13B	OLMo 2 32B (Instruct)
MMLU	63.7	67.5	78.0 (approximate)
GSM8K	67.5	75.1	78.4
ARC Challenge	79.8	83.5	(not reported)
HellaSwag	83.8	86.4	(not reported)
DROP	60.8	63.4	(not reported)
Natural Questions	36.9	46.5	(not reported)
TriviaQA	78.0	81.9	(not reported)
AGIEval	50.4	54.2	(not reported)
MMLU-Pro	31.0	35.1	53.9

For OLMo 3 Think 32B, Ai2 reports MATH 96.1, AIME 2024 76.8, BigBenchHard 89.8, HumanEval+ 91.4, and IFEval 89.0, putting the model in the same ballpark as much larger frontier reasoning models.^[8] Ai2 describes it as "the strongest fully open thinking model we're aware of," noting that it narrows the gap to open weight models of similar scale such as Qwen 3 32B "while training on roughly 6x fewer tokens."^[8]

The historically interesting datapoint is the original OLMo 7B from February 2024, which scored only 28.3 on MMLU.^[1] The 24 point jump to roughly 52 in OLMo 7B 0424 came almost entirely from data and training improvements rather than architectural changes, and is often cited as evidence that data quality dominates capability gains at the 7B scale.^[10]

How does OLMo differ from Llama, Mistral, and other open models?

Model	Params	License	Open data	Open code	Open checkpoints	MMLU
OLMo 2 7B	7B	Apache 2.0	Yes	Yes	Yes (every 1k steps)	63.7
OLMo 2 13B	13B	Apache 2.0	Yes	Yes	Yes	67.5
OLMo 2 32B	32B	Apache 2.0	Yes	Yes	Yes	~78
Llama 3.1 8B	8B	Llama 3.1 Community License	No	Partial	No	61.8
Mistral 7B	7B	Apache 2.0	No	No	No	60.1
Mistral Nemo 12B	12B	Apache 2.0	No	No	No	66.9
Pythia 6.9B	6.9B	Apache 2.0	Yes (The Pile)	Yes	Yes	25.7
Qwen 2.5 7B	7B	Apache 2.0 / Tongyi Qianwen	No	Partial	No	74.2
Falcon 7B	7B	Apache 2.0	Partial	Partial	No	27.8

Of the well known open weight families, only Pythia from EleutherAI matches OLMo on the openness axis. Pythia was released in 2023 specifically as a research artifact and tops out at 12B parameters, so OLMo 2 32B is the largest fully open language model. Llama and Mistral release weights and Apache 2.0 licensed inference code but treat their training data and training code as proprietary, which is why they sit in a different column of the comparison.

Reception and impact

OLMo has been treated by parts of the research community as the open language model the field actually needs, even when its raw benchmarks lag behind closed or weight only competitors. The original release in February 2024 was covered as a milestone by GeekWire and the academic press, partly because it was published just as Meta was preparing the Llama 3 launch and partly because Ai2 explicitly framed the release in terms of openness rather than benchmark records.^[15]

The Open Source Initiative's Open Source AI Definition, finalized in late 2024, made the contrast between OLMo style "data information available" releases and Llama style "weights only" releases central to the policy debate over what "open AI" should mean. OLMo, Pythia, and a handful of other releases were the main worked examples of compliance.

For practitioners, OLMo's value is not always the model itself but the artifacts around it. The Tulu 3 post training recipe and the open-instruct codebase have been adopted by other groups doing post training on third party base models.^[18] The Dolma toolkit is used as a reference implementation for data filtering pipelines.^[17] The training logs and intermediate checkpoints have appeared in dozens of follow up papers studying loss curves, emergent capabilities, and data efficiency.

OLMo 2 32B, released in March 2025, was widely noted as the first fully open model to outperform GPT-3.5 Turbo and GPT-4o mini on a multi skill academic benchmark suite.^[7] The release got coverage in MarkTechPost, AIwire, and the Ai2 blog as a proof point that an open lab on a comparatively modest budget could reach the capability tier that closed labs occupied two years earlier.^[7]

Funding for the project comes from Ai2, the nonprofit research institute founded in 2014 by Microsoft cofounder Paul Allen and now headquartered in Seattle. The institute's broader mandate is open research in AI for the common good, and OLMo is the language model leg of a portfolio that also includes the Molmo multimodal models, the Aristo science reasoning system, and the Semantic Scholar academic search engine.

References

Groeneveld, Dirk et al. "OLMo: Accelerating the Science of Language Models." arXiv:2402.00838, February 1, 2024. https://arxiv.org/abs/2402.00838 ↩
OLMo 2 Team. "2 OLMo 2 Furious." arXiv:2501.00656, December 31, 2024. https://arxiv.org/abs/2501.00656 ↩
Soldaini, Luca et al. "Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research." arXiv:2402.00159, 2024. https://arxiv.org/abs/2402.00159 ↩
Muennighoff, Niklas et al. "OLMoE: Open Mixture-of-Experts Language Models." arXiv:2409.02060, September 4, 2024. https://arxiv.org/abs/2409.02060 ↩
Lambert, Nathan et al. "Tulu 3: Pushing Frontiers in Open Language Model Post-Training." arXiv:2411.15124, November 2024. https://arxiv.org/abs/2411.15124 ↩
Allen Institute for AI. "OLMo 2: The best fully open language model to date." Ai2 Blog, November 26, 2024. https://allenai.org/blog/olmo2 ↩
Allen Institute for AI. "OLMo 2 32B: First fully open model to outperform GPT 3.5 and GPT 4o mini." Ai2 Blog, March 13, 2025. https://allenai.org/blog/olmo2-32b ↩
Allen Institute for AI. "Olmo 3: Charting a path through the model flow to lead open-source AI." Ai2 Blog, November 20, 2025. https://allenai.org/blog/olmo3 ↩
Allen Institute for AI. "OLMoE: An open, small, and state-of-the-art mixture-of-experts model." Ai2 Blog, September 4, 2024. https://allenai.org/blog/olmoe-an-open-small-and-state-of-the-art-mixture-of-experts-model-c258432d0514
Allen Institute for AI. "OLMo 1.7-7B: A 24 point improvement on MMLU." Ai2 Blog, April 17, 2024. https://allenai.org/blog/olmo-1-7-7b-a-24-point-improvement-on-mmlu-92b43f7d269d ↩
Allen Institute for AI. "Tulu 3: opens language model post-training up to more tasks and more people." Ai2 Blog, November 21, 2024. https://allenai.org/blog/tulu-3 ↩
Hugging Face model card: allenai/OLMo-2-1124-7B. https://huggingface.co/allenai/OLMo-2-1124-7B ↩
Hugging Face model card: allenai/OLMo-7B-0424. https://huggingface.co/allenai/OLMo-7B-0424 ↩
Hugging Face model card: allenai/OLMo-2-0325-32B-Instruct. https://huggingface.co/allenai/OLMo-2-0325-32B-Instruct
GeekWire. "Allen Institute for AI promises new insights into large language models with OLMo release." February 2024. https://www.geekwire.com/2024/allen-institute-for-ai-promises-new-insights-into-large-language-models-with-olmo-release/ ↩
GitHub: allenai/OLMo. https://github.com/allenai/OLMo
GitHub: allenai/dolma. https://github.com/allenai/dolma ↩
GitHub: allenai/open-instruct (Tulu 3 codebase). https://github.com/allenai/open-instruct ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Apertus Common Crawl Curriculum learning Dolma FineWeb Molmo Non-profit Organizations OLMo 2 OLMo 3 OLMoE Open-source AI Paul Allen RedPajama SwiGLU

Infobox

When was OLMo released?

What makes OLMo "fully open" rather than open weight?

Model family

Architecture

What data is OLMo trained on? Dolma

Dolma v1 composition (approximate, as used for original OLMo)

Training infrastructure

Post training: the Tulu series

How does OLMo perform on benchmarks?

How does OLMo differ from Llama, Mistral, and other open models?

Reception and impact

See also

References

Improve this article

Related Articles

Llama 3

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

What links here

Related Articles

Llama 3

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

What links here