BLOOM (language model)

Artificial Intelligence Large Language Models

17 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v7 · 3,397 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) is a 176-billion-parameter open-access large language model released on July 12, 2022, and was the first language model with more than 100 billion parameters to be released with openly available weights. It was developed by the BigScience collaborative project, a year-long research workshop involving over 1,000 researchers from more than 70 countries and 250 institutions, coordinated by Hugging Face. BLOOM was trained on the ROOTS corpus, a curated multilingual dataset comprising 1.6 terabytes of text across 46 natural languages and 13 programming languages, seeing 366 billion tokens in total during training ^[1]. The BLOOM paper frames the release as "a step towards democratizing this powerful technology," stating that "to facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License" ^[1].

BLOOM holds a distinctive place in the history of AI as the first openly released language model with more than 100 billion parameters. At the time of its release, the only other models of comparable scale were GPT-3 (175 billion parameters, closed-source) and Meta's OPT-175B (released under a restrictive research license). BLOOM was designed to demonstrate that the most powerful AI models could be developed collaboratively, transparently, and made available to the broader research community, rather than remaining locked behind corporate walls ^[2].

What is the BigScience project?

BigScience was a one-year research workshop that ran from May 2021 to May 2022. It was modeled on large-scale scientific collaborations in physics, such as those at CERN, with the goal of bringing the same collaborative ethos to AI research. The project was initiated and coordinated by Hugging Face, with Thomas Wolf (Hugging Face co-founder and CSO) and other researchers serving as key organizers ^[3].

The workshop brought together over 1,000 researchers from more than 70 countries and approximately 250 institutions. Participants included researchers from academia, non-profit organizations, and private sector companies. The project was organized into approximately 30 working groups, each focused on a specific aspect of the model's development: data sourcing and governance, model architecture, training engineering, evaluation, ethical considerations, and licensing ^[1]^[3].

A defining characteristic of BigScience was its commitment to openness and documentation at every stage. Every decision about data collection, architecture choices, training procedures, and evaluation methods was discussed in open meetings and documented publicly. This level of transparency was unusual for a project of this scale and stood in contrast to the secrecy that typically surrounds large model development at companies like OpenAI and Google ^[3].

The compute for training BLOOM was provided through a grant worth an estimated 3 million euros from two French research agencies: the Centre National de la Recherche Scientifique (CNRS) and the Grand Equipement National de Calcul Intensif (GENCI). The training was conducted on the Jean Zay supercomputer, operated by IDRIS (Institut du Developpement et des Ressources en Informatique Scientifique) in the south of Paris, France ^[4].

The ROOTS Corpus

The training data for BLOOM is called the ROOTS corpus (Responsible Open-science Open-collaboration Transparent Sources). Building this dataset was one of the most labor-intensive aspects of the BigScience project, involving careful curation, documentation, and governance of every data source ^[5].

ROOTS comprises 1.6 terabytes of pre-processed text, converted into approximately 350 billion unique tokens. The corpus spans 46 natural languages and 13 programming languages, for a total of 59 languages. The data was drawn from 498 sources, including web crawls, academic publications, books, and code repositories ^[1]^[5].

Data Composition

The largest single component of ROOTS is data from the OSCAR corpus, which accounts for approximately 38% of the total dataset. OSCAR is a multilingual web corpus derived from CommonCrawl snapshots. The remainder of ROOTS consists of newly collected data extracted from a manually selected and documented list of language data sources ^[5].

Language Category	Examples	Share of ROOTS
English	Web, books, academic papers	~30%
Other European Languages	French, Spanish, Portuguese, German, Italian, Catalan	~25%
Asian Languages	Chinese, Hindi, Arabic, Indonesian, Vietnamese	~15%
African Languages	Swahili, Yoruba, Igbo, others	~5%
Programming Languages	Python, Java, C++, JavaScript, others (13 total)	~12%
Other	Miscellaneous sources and languages	~13%

The data governance process for ROOTS was unusually rigorous for a training dataset. Each source was individually reviewed, documented, and approved through a structured process. The BigScience Data Governance Working Group developed frameworks for assessing data provenance, consent, and potential harms. While the ROOTS corpus was not released in its entirety due to licensing constraints on some component datasets, detailed documentation of its composition and creation process was published ^[5].

Language Coverage

BLOOM's multilingual coverage was a deliberate design choice. The BigScience team made a conscious effort to include languages from diverse language families and geographic regions, not just the high-resource European languages that dominate most training datasets. However, the distribution of data across languages was not uniform; English and other well-resourced languages had significantly more training data than lower-resource languages ^[1].

The inclusion of 13 programming languages (including Python, Java, C++, JavaScript, PHP, and others) followed the pattern established by earlier models like Codex and was intended to give BLOOM basic code generation and understanding capabilities.

Architecture

BLOOM is a decoder-only transformer model, following the same general architecture as GPT-3 and other autoregressive language models. However, the BigScience team made several specific architectural decisions that differed from the standard GPT-style design ^[1].

Model Specifications

Specification	Value
Parameters	176 billion
Layers	70
Hidden Dimension	14,336
Attention Heads	112
Vocabulary Size	250,680
Context Length	2,048 tokens
Activation Function	GeLU
Position Encoding	ALiBi (Attention with Linear Biases)
Training Tokens	~366 billion (350 billion unique)

Why did BLOOM use ALiBi position encoding?

One of BLOOM's most notable architectural choices was the use of ALiBi (Attention with Linear Biases) for positional encoding instead of the learned or sinusoidal position embeddings used by GPT-3 and most other models at the time. ALiBi works by adding a linear bias to attention scores based on the distance between query and key positions, rather than adding positional information to the input embeddings ^[1]^[6].

The decision to use ALiBi was strategic: the main motivation was its demonstrated ability to extrapolate to sequence lengths longer than those seen during training. This was important because BLOOM was trained with a context length of 2,048 tokens, and the ability to generalize to longer sequences without significant performance degradation was desirable. As a bonus, the BigScience team found that ALiBi not only provided good extrapolation properties but also improved downstream performance and led to a smoother training process compared to alternatives tested during development ^[6].

Embedding Layer Normalization

BLOOM uses an additional layer normalization applied to the word embedding layer immediately after the embedding lookup. This was a modification from the standard Megatron-LM GPT2 architecture on which BLOOM was based. The embedding normalization was found to stabilize training at the 176B parameter scale, helping to prevent the training instabilities that can occur with very large models ^[1].

How was BLOOM trained?

BLOOM was trained between March 11, 2022, and July 6, 2022, a period of approximately 117 days. The training began on March 11, 2022 at 11:42am PST and was conducted on the Jean Zay supercomputer using 384 NVIDIA A100 80GB GPUs organized across 48 nodes, with an additional 32 A100 80GB GPUs (4 nodes) held in reserve ^[4].

Training Infrastructure

The training framework combined elements from Megatron-LM (NVIDIA's library for training large transformer models) and DeepSpeed (Microsoft's library for distributed training optimization). This combination, sometimes referred to as Megatron-DeepSpeed, enabled efficient parallelism across multiple dimensions ^[4].

Parallelism Type	Configuration	Purpose
Tensor Parallelism	4-way	Splits individual layers across GPUs
Pipeline Parallelism	12-way	Distributes layers across GPU groups
Data Parallelism	8-way	Replicates model across GPU groups
Total GPUs	384 (48 nodes x 8 GPUs)	A100 80GB

The training used mixed-precision (BF16) to reduce memory usage and increase throughput. The learning rate schedule followed a warmup period followed by cosine decay. Gradient checkpointing was used to trade compute for memory, enabling the large model to fit within the available GPU memory ^[4].

Training Challenges

Training a 176B-parameter model for 117 days presented numerous engineering challenges. The BigScience team documented several training instabilities, hardware failures, and infrastructure issues that required intervention during the training run. The Jean Zay supercomputer, while powerful, was a shared resource used by many research groups, which added scheduling and resource allocation complexity ^[4].

Despite these challenges, the training was completed successfully, and the BigScience team published detailed documentation of the entire training process, including logs, hyperparameters, and lessons learned. This documentation has been valuable to subsequent large-scale training efforts by other organizations.

What was BLOOM's carbon footprint?

Because BLOOM was trained on the Jean Zay supercomputer, which runs largely on nuclear-powered (low-carbon) electricity, the BigScience team was able to publish one of the first detailed lifecycle carbon estimates for a model of this scale. The dedicated study, Estimating the Carbon Footprint of BLOOM, found that the final training run consumed approximately 433 MWh of electricity and emitted about 24.7 tonnes of CO2eq when counting only dynamic power consumption, rising to roughly 50.5 tonnes of CO2eq when accounting for equipment manufacturing and the idle consumption of the compute cluster ^[10].

The study highlighted the effect of grid carbon intensity: the French grid used for BLOOM emitted around 57 gCO2eq/kWh, compared with 231 gCO2eq/kWh for the grid used to train Meta's OPT-175B. As a result, although BLOOM used slightly more energy than OPT (433 MWh versus 324 MWh), its operational training emissions were roughly two-thirds lower (about 25 tonnes versus around 70 tonnes of CO2eq) ^[10].

BLOOMZ and Multitask Fine-tuning

Following the release of the base BLOOM model, the BigScience team released BLOOMZ, a version of BLOOM that was fine-tuned on the xP3 (Crosslingual Public Pool of Prompts) dataset. xP3 is a collection of prompts and tasks in multiple languages, designed to improve the model's ability to follow instructions and perform tasks in a zero-shot or few-shot setting across diverse languages ^[7].

BLOOMZ demonstrated significantly improved performance on downstream tasks compared to the base BLOOM model, particularly in multilingual settings. The fine-tuning process showed that relatively modest amounts of instruction tuning data could substantially improve a large model's usability, a finding that was consistent with concurrent work on instruction tuning by other research groups ^[7].

Model	Fine-tuning Data	Key Improvement
BLOOM (base)	None (pretrained only)	Strong multilingual generation
BLOOMZ	xP3 (crosslingual prompts)	Better instruction following, zero-shot task performance
BLOOMZ-MT	xP3mt (machine-translated prompts)	Extended multilingual task coverage

Is BLOOM open source? The RAIL License

BLOOM was released under the BigScience Responsible AI License (RAIL), version 1.0. This license was a novel approach to AI model licensing that sought to balance open access with responsible use. Unlike traditional open-source licenses like Apache 2.0 or MIT, which impose minimal restrictions on use, the RAIL license includes specific behavioral restrictions designed to prevent harmful applications ^[8].

The RAIL license allows anyone to freely download, use, modify, and distribute BLOOM and its derivatives, but it prohibits certain uses that the BigScience community deemed harmful. According to the official model card, BLOOM was created "in order to enable public research on large language models (LLMs)," and the developers list out-of-scope uses including application in "biomedical domains, political and legal domains, or finance domains" and for "evaluating or scoring individuals, such as for employment, education, or credit" ^[5]. Restricted uses include:

Generating false information intended to harm others
Impersonating individuals without consent
Automating decisions that adversely affect individuals' legal rights
Discriminating against legally protected characteristics
Generating content that exploits or harms children
Providing medical, legal, or financial advice without appropriate disclaimers

The development of the RAIL license was a significant contribution in its own right. It influenced subsequent model licensing approaches, including the licenses used by Meta for LLaMA and by Stability AI for Stable Diffusion. The concept of use restrictions layered on top of otherwise open access has become a common pattern in AI model licensing, sometimes referred to as "responsible AI licensing" or "ethical source" licensing ^[8].

Performance and Benchmarks

BLOOM's performance was evaluated across a wide range of benchmarks, covering both English and multilingual tasks. At 176 billion parameters, BLOOM was roughly comparable in scale to GPT-3 (175B), making direct comparisons natural, though differences in training data, architecture, and evaluation methodology make exact comparisons complex ^[1].

English Benchmarks

On standard English-language benchmarks, BLOOM's performance was competitive but generally did not match GPT-3 or OPT-175B on all tasks. The base BLOOM model showed particular strength in generative tasks and multilingual applications, while its performance on knowledge-intensive tasks like MMLU was somewhat weaker than GPT-3 ^[1].

Benchmark	BLOOM (176B)	GPT-3 (175B)	OPT-175B
HellaSwag (0-shot)	73.0	78.9	76.3
LAMBADA (0-shot)	67.2	76.2	74.7
ARC Easy (0-shot)	65.5	68.8	69.2
WinoGrande (0-shot)	64.4	70.2	68.5

However, after multitask fine-tuning (BLOOMZ), performance improved substantially across most benchmarks, often closing or narrowing the gap with GPT-3 ^[7].

Multilingual Performance

BLOOM's primary differentiator was its multilingual capability. The model demonstrated the ability to generate coherent text, translate between languages, and perform tasks in languages that were underrepresented in other large models. Performance varied significantly across languages, with stronger results for languages that had more training data in the ROOTS corpus ^[1].

How does BLOOM compare with contemporary models?

BLOOM was released during a period of rapid development in large language models. Understanding its position requires comparing it with the other major models available at the time.

Feature	BLOOM	GPT-3	OPT-175B	LLaMA (2023)
Parameters	176B	175B	175B	7B-65B
Release Date	July 2022	June 2020	May 2022	February 2023
Training Data	ROOTS (1.6TB, 59 languages)	Filtered CommonCrawl + books + Wikipedia	The Pile + custom	Publicly available data
Languages	46 natural + 13 programming	Primarily English	Primarily English	Primarily English
Access	Open weights (RAIL license)	API only (closed)	Open weights (research license)	Open weights (research license)
Position Encoding	ALiBi	Learned	Learned	RoPE
Developer	BigScience (1000+ researchers)	OpenAI	Meta AI	Meta AI

GPT-3, while closed-source, was the established benchmark for 175B-scale models. OPT-175B, released by Meta AI in May 2022, was the first openly released model at this scale, but its license restricted use to research purposes. BLOOM, released two months after OPT, was the first model at this scale with a license that permitted commercial use (subject to RAIL restrictions) ^[2].

Meta's LLaMA, released in February 2023, largely superseded both OPT and BLOOM in terms of community adoption. LLaMA achieved better performance with smaller model sizes (the LLaMA 65B model outperformed BLOOM 176B on many benchmarks despite having fewer than half the parameters), thanks to improvements in training data quality and training methodology ^[9].

Legacy and Significance

BLOOM's significance extends well beyond its benchmark scores. The project demonstrated several important principles that have influenced the subsequent development of the AI field.

Collaborative Development at Scale: BigScience proved that a globally distributed collaboration of over 1,000 researchers could successfully develop and train a state-of-the-art language model. This model of development stood as an alternative to the concentrated corporate approach of OpenAI, Google, and Meta ^[3].

Multilingual Representation: BLOOM was the most linguistically diverse large language model at the time of its release. While subsequent models have expanded multilingual coverage, BLOOM set an important precedent for including underrepresented languages in large-scale AI training ^[1].

Responsible Licensing: The RAIL license pioneered the concept of behavioral restrictions in AI model licenses. This approach has been widely adopted and adapted by subsequent model releases, influencing how the industry thinks about responsible distribution of powerful AI systems ^[8].

Open Training Documentation: The BigScience team published extraordinarily detailed documentation of the entire training process, from data collection to final evaluation. This transparency has served as a resource for other organizations attempting large-scale model training and has set a standard for reproducibility in AI research ^[4].

Democratization of Access: By making a 176B-parameter model freely available (with responsible use restrictions), BLOOM expanded access to frontier-scale AI beyond the handful of well-funded corporations that could afford to train such models. This was particularly significant for researchers and institutions in developing countries that lacked the resources to build their own large models ^[2].

However, BLOOM also illustrated some of the limitations of the collaborative approach. The model's performance on English benchmarks lagged behind GPT-3 and later behind LLaMA, suggesting that the distributed development process may have introduced some efficiency costs compared to the more focused efforts of well-resourced corporate labs. The project's emphasis on documentation, governance, and inclusivity, while valuable in its own right, consumed resources that might otherwise have been directed toward model performance.

Smaller BLOOM Models

In addition to the full 176B model, the BigScience team released a series of smaller BLOOM models at various parameter scales. These smaller models were trained on the same ROOTS corpus with the same architecture (scaled down) and served as baselines for research and as more accessible options for users with limited compute.

Model	Parameters	Use Case
BLOOM-560M	560 million	Lightweight experimentation
BLOOM-1B1	1.1 billion	Research and prototyping
BLOOM-1B7	1.7 billion	Small-scale deployment
BLOOM-3B	3 billion	Moderate capability tasks
BLOOM-7B1	7.1 billion	General-purpose use
BLOOM	176 billion	Full-scale deployment

Current State

As of early 2026, BLOOM remains available for download and use on the Hugging Face Model Hub. However, the model has been largely superseded by more recent open-source models that offer better performance at similar or smaller scales. Meta's LLaMA family, Mistral AI's models, and other open-weight alternatives have significantly advanced the state of the art since BLOOM's release in mid-2022 ^[9].

The BigScience workshop itself concluded after the release of BLOOM and its associated papers, and no BLOOM 2 or direct successor has been announced. The project's contributions live on primarily through the RAIL licensing framework, the documentation of large-scale training, and the ROOTS corpus methodology.

BLOOM's position in the history of AI is that of a pioneer. It was the first truly open large-scale language model, the first to take multilingual representation seriously at the 100B+ parameter scale, and the first to be developed through a genuinely collaborative international effort. While its raw performance has been eclipsed by subsequent models, the principles it established continue to shape how the AI community thinks about openness, collaboration, and responsibility in model development.

References

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model - arXiv, November 2022 ↩
BigScience AI Researchers Open-Source BLOOM: An Autoregressive Multilingual Large Language Model Larger Than GPT-3 and OPT-175B - MarkTechPost, July 2022 ↩
Meet Bloom: The 'Most Important AI Model of the Decade' - AI Business ↩
The Technology Behind BLOOM Training - Hugging Face Blog ↩
BLOOM Model Card - Hugging Face Model Card ↩
ALiBi: Attention with Linear Biases - arXiv ↩
Crosslingual Generalization through Multitask Finetuning - arXiv, November 2022 ↩
Notes on BLOOM, RAIL and openness of AI - Open Future ↩
The Ultimate Battle of Language Models: Lit-LLaMA vs GPT3.5 vs BLOOM - Lightning AI ↩
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model - arXiv / JMLR, November 2022 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

6 revisions by 1 contributors · full history

Suggest edit

BLOOM (language model)

What is the BigScience project?

The ROOTS Corpus

Data Composition

Language Coverage

Architecture

Model Specifications

Why did BLOOM use ALiBi position encoding?

Embedding Layer Normalization

How was BLOOM trained?

Training Infrastructure

Training Challenges

What was BLOOM's carbon footprint?

BLOOMZ and Multitask Fine-tuning

Is BLOOM open source? The RAIL License

Performance and Benchmarks

English Benchmarks

Multilingual Performance

How does BLOOM compare with contemporary models?

Legacy and Significance

Smaller BLOOM Models

Current State

See Also

References

Improve this article

What links here (24 of 30)

What links here (24 of 30)

What is the BigScience project?

The ROOTS Corpus

Data Composition

Language Coverage

Architecture

Model Specifications

Why did BLOOM use ALiBi position encoding?

Embedding Layer Normalization

How was BLOOM trained?

Training Infrastructure

Training Challenges

What was BLOOM's carbon footprint?

BLOOMZ and Multitask Fine-tuning

Is BLOOM open source? The RAIL License

Performance and Benchmarks

English Benchmarks

Multilingual Performance

How does BLOOM compare with contemporary models?

Legacy and Significance

Smaller BLOOM Models

Current State

See Also

References

Improve this article

Related Articles

Agentic Context Engineering

Anthropic

Claude Sonnet 4.5

Context window

DeepSeek

Foundation models

What links here (24 of 30)

Related Articles

Agentic Context Engineering

Anthropic

Claude Sonnet 4.5

Context window

DeepSeek

Foundation models

What links here (24 of 30)