BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) is a 176-billion-parameter open-access large language model released on July 12, 2022. It was developed by the BigScience collaborative project, a year-long research workshop involving over 1,000 researchers from more than 70 countries and 250 institutions, coordinated by Hugging Face. BLOOM was trained on the ROOTS corpus, a curated multilingual dataset comprising 1.6 terabytes of text across 46 natural languages and 13 programming languages [1].
BLOOM holds a distinctive place in the history of AI as the first openly released language model with more than 100 billion parameters. At the time of its release, the only other models of comparable scale were GPT-3 (175 billion parameters, closed-source) and Meta's OPT-175B (released under a restrictive research license). BLOOM was designed to demonstrate that the most powerful AI models could be developed collaboratively, transparently, and made available to the broader research community, rather than remaining locked behind corporate walls [2].
BigScience was a one-year research workshop that ran from May 2021 to May 2022. It was modeled on large-scale scientific collaborations in physics, such as those at CERN, with the goal of bringing the same collaborative ethos to AI research. The project was initiated and coordinated by Hugging Face, with Thomas Wolf (Hugging Face co-founder and CSO) and other researchers serving as key organizers [3].
The workshop brought together over 1,000 researchers from more than 70 countries and approximately 250 institutions. Participants included researchers from academia, non-profit organizations, and private sector companies. The project was organized into approximately 30 working groups, each focused on a specific aspect of the model's development: data sourcing and governance, model architecture, training engineering, evaluation, ethical considerations, and licensing [1][3].
A defining characteristic of BigScience was its commitment to openness and documentation at every stage. Every decision about data collection, architecture choices, training procedures, and evaluation methods was discussed in open meetings and documented publicly. This level of transparency was unusual for a project of this scale and stood in contrast to the secrecy that typically surrounds large model development at companies like OpenAI and Google [3].
The compute for training BLOOM was provided through a grant worth an estimated 3 million euros from two French research agencies: the Centre National de la Recherche Scientifique (CNRS) and the Grand Equipement National de Calcul Intensif (GENCI). The training was conducted on the Jean Zay supercomputer, operated by IDRIS (Institut du Developpement et des Ressources en Informatique Scientifique) in the south of Paris, France [4].
The training data for BLOOM is called the ROOTS corpus (Responsible Open-science Open-collaboration Transparent Sources). Building this dataset was one of the most labor-intensive aspects of the BigScience project, involving careful curation, documentation, and governance of every data source [5].
ROOTS comprises 1.6 terabytes of pre-processed text, converted into approximately 350 billion unique tokens. The corpus spans 46 natural languages and 13 programming languages, for a total of 59 languages. The data was drawn from 498 sources, including web crawls, academic publications, books, and code repositories [1][5].
The largest single component of ROOTS is data from the OSCAR corpus, which accounts for approximately 38% of the total dataset. OSCAR is a multilingual web corpus derived from CommonCrawl snapshots. The remainder of ROOTS consists of newly collected data extracted from a manually selected and documented list of language data sources [5].
| Language Category | Examples | Share of ROOTS |
|---|---|---|
| English | Web, books, academic papers | ~30% |
| Other European Languages | French, Spanish, Portuguese, German, Italian, Catalan | ~25% |
| Asian Languages | Chinese, Hindi, Arabic, Indonesian, Vietnamese | ~15% |
| African Languages | Swahili, Yoruba, Igbo, others | ~5% |
| Programming Languages | Python, Java, C++, JavaScript, others (13 total) | ~12% |
| Other | Miscellaneous sources and languages | ~13% |
The data governance process for ROOTS was unusually rigorous for a training dataset. Each source was individually reviewed, documented, and approved through a structured process. The BigScience Data Governance Working Group developed frameworks for assessing data provenance, consent, and potential harms. While the ROOTS corpus was not released in its entirety due to licensing constraints on some component datasets, detailed documentation of its composition and creation process was published [5].
BLOOM's multilingual coverage was a deliberate design choice. The BigScience team made a conscious effort to include languages from diverse language families and geographic regions, not just the high-resource European languages that dominate most training datasets. However, the distribution of data across languages was not uniform; English and other well-resourced languages had significantly more training data than lower-resource languages [1].
The inclusion of 13 programming languages (including Python, Java, C++, JavaScript, PHP, and others) followed the pattern established by earlier models like Codex and was intended to give BLOOM basic code generation and understanding capabilities.
BLOOM is a decoder-only transformer model, following the same general architecture as GPT-3 and other autoregressive language models. However, the BigScience team made several specific architectural decisions that differed from the standard GPT-style design [1].
| Specification | Value |
|---|---|
| Parameters | 176 billion |
| Layers | 70 |
| Hidden Dimension | 14,336 |
| Attention Heads | 112 |
| Vocabulary Size | 250,680 |
| Context Length | 2,048 tokens |
| Activation Function | GeLU |
| Position Encoding | ALiBi (Attention with Linear Biases) |
| Training Tokens | ~350 billion |
One of BLOOM's most notable architectural choices was the use of ALiBi (Attention with Linear Biases) for positional encoding instead of the learned or sinusoidal position embeddings used by GPT-3 and most other models at the time. ALiBi works by adding a linear bias to attention scores based on the distance between query and key positions, rather than adding positional information to the input embeddings [1][6].
The decision to use ALiBi was strategic: the main motivation was its demonstrated ability to extrapolate to sequence lengths longer than those seen during training. This was important because BLOOM was trained with a context length of 2,048 tokens, and the ability to generalize to longer sequences without significant performance degradation was desirable. As a bonus, the BigScience team found that ALiBi not only provided good extrapolation properties but also improved downstream performance and led to a smoother training process compared to alternatives tested during development [6].
BLOOM uses an additional layer normalization applied to the word embedding layer immediately after the embedding lookup. This was a modification from the standard Megatron-LM GPT2 architecture on which BLOOM was based. The embedding normalization was found to stabilize training at the 176B parameter scale, helping to prevent the training instabilities that can occur with very large models [1].
BLOOM was trained between March 11, 2022, and July 6, 2022, a period of approximately 117 days. The training was conducted on the Jean Zay supercomputer using 384 NVIDIA A100 80GB GPUs organized across 48 nodes [4].
The training framework combined elements from Megatron-LM (NVIDIA's library for training large transformer models) and DeepSpeed (Microsoft's library for distributed training optimization). This combination, sometimes referred to as Megatron-DeepSpeed, enabled efficient parallelism across multiple dimensions [4].
| Parallelism Type | Configuration | Purpose |
|---|---|---|
| Tensor Parallelism | 4-way | Splits individual layers across GPUs |
| Pipeline Parallelism | 12-way | Distributes layers across GPU groups |
| Data Parallelism | 8-way | Replicates model across GPU groups |
| Total GPUs | 384 (48 nodes x 8 GPUs) | A100 80GB |
The training used mixed-precision (BF16) to reduce memory usage and increase throughput. The learning rate schedule followed a warmup period followed by cosine decay. Gradient checkpointing was used to trade compute for memory, enabling the large model to fit within the available GPU memory [4].
Training a 176B-parameter model for 117 days presented numerous engineering challenges. The BigScience team documented several training instabilities, hardware failures, and infrastructure issues that required intervention during the training run. The Jean Zay supercomputer, while powerful, was a shared resource used by many research groups, which added scheduling and resource allocation complexity [4].
Despite these challenges, the training was completed successfully, and the BigScience team published detailed documentation of the entire training process, including logs, hyperparameters, and lessons learned. This documentation has been valuable to subsequent large-scale training efforts by other organizations.
Following the release of the base BLOOM model, the BigScience team released BLOOMZ, a version of BLOOM that was fine-tuned on the xP3 (Crosslingual Public Pool of Prompts) dataset. xP3 is a collection of prompts and tasks in multiple languages, designed to improve the model's ability to follow instructions and perform tasks in a zero-shot or few-shot setting across diverse languages [7].
BLOOMZ demonstrated significantly improved performance on downstream tasks compared to the base BLOOM model, particularly in multilingual settings. The fine-tuning process showed that relatively modest amounts of instruction tuning data could substantially improve a large model's usability, a finding that was consistent with concurrent work on instruction tuning by other research groups [7].
| Model | Fine-tuning Data | Key Improvement |
|---|---|---|
| BLOOM (base) | None (pretrained only) | Strong multilingual generation |
| BLOOMZ | xP3 (crosslingual prompts) | Better instruction following, zero-shot task performance |
| BLOOMZ-MT | xP3mt (machine-translated prompts) | Extended multilingual task coverage |
BLOOM was released under the BigScience Responsible AI License (RAIL), version 1.0. This license was a novel approach to AI model licensing that sought to balance open access with responsible use. Unlike traditional open-source licenses like Apache 2.0 or MIT, which impose minimal restrictions on use, the RAIL license includes specific behavioral restrictions designed to prevent harmful applications [8].
The RAIL license allows anyone to freely download, use, modify, and distribute BLOOM and its derivatives, but it prohibits certain uses that the BigScience community deemed harmful. Restricted uses include:
The development of the RAIL license was a significant contribution in its own right. It influenced subsequent model licensing approaches, including the licenses used by Meta for LLaMA and by Stability AI for Stable Diffusion. The concept of use restrictions layered on top of otherwise open access has become a common pattern in AI model licensing, sometimes referred to as "responsible AI licensing" or "ethical source" licensing [8].
BLOOM's performance was evaluated across a wide range of benchmarks, covering both English and multilingual tasks. At 176 billion parameters, BLOOM was roughly comparable in scale to GPT-3 (175B), making direct comparisons natural, though differences in training data, architecture, and evaluation methodology make exact comparisons complex [1].
On standard English-language benchmarks, BLOOM's performance was competitive but generally did not match GPT-3 or OPT-175B on all tasks. The base BLOOM model showed particular strength in generative tasks and multilingual applications, while its performance on knowledge-intensive tasks like MMLU was somewhat weaker than GPT-3 [1].
| Benchmark | BLOOM (176B) | GPT-3 (175B) | OPT-175B |
|---|---|---|---|
| HellaSwag (0-shot) | 73.0 | 78.9 | 76.3 |
| LAMBADA (0-shot) | 67.2 | 76.2 | 74.7 |
| ARC Easy (0-shot) | 65.5 | 68.8 | 69.2 |
| WinoGrande (0-shot) | 64.4 | 70.2 | 68.5 |
However, after multitask fine-tuning (BLOOMZ), performance improved substantially across most benchmarks, often closing or narrowing the gap with GPT-3 [7].
BLOOM's primary differentiator was its multilingual capability. The model demonstrated the ability to generate coherent text, translate between languages, and perform tasks in languages that were underrepresented in other large models. Performance varied significantly across languages, with stronger results for languages that had more training data in the ROOTS corpus [1].
BLOOM was released during a period of rapid development in large language models. Understanding its position requires comparing it with the other major models available at the time.
| Feature | BLOOM | GPT-3 | OPT-175B | LLaMA (2023) |
|---|---|---|---|---|
| Parameters | 176B | 175B | 175B | 7B-65B |
| Release Date | July 2022 | June 2020 | May 2022 | February 2023 |
| Training Data | ROOTS (1.6TB, 59 languages) | Filtered CommonCrawl + books + Wikipedia | The Pile + custom | Publicly available data |
| Languages | 46 natural + 13 programming | Primarily English | Primarily English | Primarily English |
| Access | Open weights (RAIL license) | API only (closed) | Open weights (research license) | Open weights (research license) |
| Position Encoding | ALiBi | Learned | Learned | RoPE |
| Developer | BigScience (1000+ researchers) | OpenAI | Meta AI | Meta AI |
GPT-3, while closed-source, was the established benchmark for 175B-scale models. OPT-175B, released by Meta AI in May 2022, was the first openly released model at this scale, but its license restricted use to research purposes. BLOOM, released two months after OPT, was the first model at this scale with a license that permitted commercial use (subject to RAIL restrictions) [2].
Meta's LLaMA, released in February 2023, largely superseded both OPT and BLOOM in terms of community adoption. LLaMA achieved better performance with smaller model sizes (the LLaMA 65B model outperformed BLOOM 176B on many benchmarks despite having fewer than half the parameters), thanks to improvements in training data quality and training methodology [9].
BLOOM's significance extends well beyond its benchmark scores. The project demonstrated several important principles that have influenced the subsequent development of the AI field.
Collaborative Development at Scale: BigScience proved that a globally distributed collaboration of over 1,000 researchers could successfully develop and train a state-of-the-art language model. This model of development stood as an alternative to the concentrated corporate approach of OpenAI, Google, and Meta [3].
Multilingual Representation: BLOOM was the most linguistically diverse large language model at the time of its release. While subsequent models have expanded multilingual coverage, BLOOM set an important precedent for including underrepresented languages in large-scale AI training [1].
Responsible Licensing: The RAIL license pioneered the concept of behavioral restrictions in AI model licenses. This approach has been widely adopted and adapted by subsequent model releases, influencing how the industry thinks about responsible distribution of powerful AI systems [8].
Open Training Documentation: The BigScience team published extraordinarily detailed documentation of the entire training process, from data collection to final evaluation. This transparency has served as a resource for other organizations attempting large-scale model training and has set a standard for reproducibility in AI research [4].
Democratization of Access: By making a 176B-parameter model freely available (with responsible use restrictions), BLOOM expanded access to frontier-scale AI beyond the handful of well-funded corporations that could afford to train such models. This was particularly significant for researchers and institutions in developing countries that lacked the resources to build their own large models [2].
However, BLOOM also illustrated some of the limitations of the collaborative approach. The model's performance on English benchmarks lagged behind GPT-3 and later behind LLaMA, suggesting that the distributed development process may have introduced some efficiency costs compared to the more focused efforts of well-resourced corporate labs. The project's emphasis on documentation, governance, and inclusivity, while valuable in its own right, consumed resources that might otherwise have been directed toward model performance.
In addition to the full 176B model, the BigScience team released a series of smaller BLOOM models at various parameter scales. These smaller models were trained on the same ROOTS corpus with the same architecture (scaled down) and served as baselines for research and as more accessible options for users with limited compute.
| Model | Parameters | Use Case |
|---|---|---|
| BLOOM-560M | 560 million | Lightweight experimentation |
| BLOOM-1B1 | 1.1 billion | Research and prototyping |
| BLOOM-1B7 | 1.7 billion | Small-scale deployment |
| BLOOM-3B | 3 billion | Moderate capability tasks |
| BLOOM-7B1 | 7.1 billion | General-purpose use |
| BLOOM | 176 billion | Full-scale deployment |
As of early 2026, BLOOM remains available for download and use on the Hugging Face Model Hub. However, the model has been largely superseded by more recent open-source models that offer better performance at similar or smaller scales. Meta's LLaMA family, Mistral AI's models, and other open-weight alternatives have significantly advanced the state of the art since BLOOM's release in mid-2022 [9].
The BigScience workshop itself concluded after the release of BLOOM and its associated papers, and no BLOOM 2 or direct successor has been announced. The project's contributions live on primarily through the RAIL licensing framework, the documentation of large-scale training, and the ROOTS corpus methodology.
BLOOM's position in the history of AI is that of a pioneer. It was the first truly open large-scale language model, the first to take multilingual representation seriously at the 100B+ parameter scale, and the first to be developed through a genuinely collaborative international effort. While its raw performance has been eclipsed by subsequent models, the principles it established continue to shape how the AI community thinks about openness, collaboration, and responsibility in model development.