EleutherAI is a non-profit artificial intelligence research laboratory focused on the development, study, and open release of large language models. Founded in July 2020 by Connor Leahy, Sid Black, and Leo Gao as a grassroots community on Discord, EleutherAI grew from an informal group of AI enthusiasts into one of the most influential organizations in the open-source AI movement. The name "EleutherAI" derives from the Greek word eleutheria, meaning liberty, reflecting the group's commitment to making powerful AI systems freely accessible to everyone.
EleutherAI is responsible for producing several widely used open-source language models, including GPT-Neo, GPT-J, GPT-NeoX-20B, and the Pythia model suite. The organization also created The Pile, an 825 GiB open-source text dataset that became one of the most commonly used training corpora for language model research, and the LM Evaluation Harness, which has become the standard framework for evaluating language models across the AI industry.
The story of EleutherAI begins in the summer of 2020, shortly after OpenAI released its landmark GPT-3 paper describing a 175-billion-parameter language model. On July 2, 2020, a user known as "Daj" (Connor Leahy) posted in Shawn Presser's AI-focused Discord server suggesting the community should "give OpenAI a run for their money" by attempting to replicate GPT-3 as an open-source project. Leahy had previously gained attention in 2019 for reverse-engineering GPT-2 in his bedroom.
The idea resonated with several members of the server, and on July 7, 2020, Leahy, Sid Black, and Leo Gao created a new Discord server under the tentative name "LibreAI." The server quickly attracted volunteers from around the world who were interested in democratizing access to large-scale AI. On July 28, 2020, the group rebranded from "LibreAI" to "EleutherAI," choosing the name as a reference to the Greek concept of eleutheria (liberty or freedom).
In its earliest days, EleutherAI operated entirely as a volunteer-driven collective. Contributors were independent researchers, students, hobbyists, and professionals who donated their time and skills to the project. The group coordinated its work through Discord channels, with no formal organizational structure, funding, or institutional backing.
The first major milestone came on December 31, 2020, when EleutherAI publicly released The Pile, an 825 GiB curated dataset of diverse English text assembled from 22 different sources. The dataset was primarily curated by Leo Gao and Stella Biderman, who would later become the organization's Executive Director. The Pile was designed to provide a high-quality, diverse training corpus that could serve as an alternative to proprietary datasets used by companies like OpenAI.
On March 21, 2021, EleutherAI released the GPT-Neo model family, consisting of models with 125 million, 1.3 billion, and 2.7 billion parameters. These were the first open-source models explicitly designed to replicate the GPT-3 architecture, built using the Mesh TensorFlow library for distributed training on TPUs. Although originally intended as a proof of concept, GPT-Neo attracted far more attention than the team anticipated. The models were trained on The Pile using compute resources from Google's TPU Research Cloud (TRC) program, which Connor Leahy had access to from a prior research allocation.
On June 9, 2021, the group released GPT-J-6B, a six-billion-parameter model that was, at the time of release, the largest publicly available GPT-3-style language model in the world. GPT-J was trained using Ben Wang's Mesh Transformer JAX library on a TPU v3-256 pod and achieved performance comparable to OpenAI's 6.7-billion-parameter Curie model. The release of GPT-J marked a turning point for the open-source AI community, demonstrating that volunteer-driven projects could produce models competitive with those built by well-funded corporations.
As EleutherAI's ambitions grew, so did its need for computational resources beyond what Google's TPU Research Cloud could provide. In early 2021, the group accepted a partnership with CoreWeave, a cloud computing company that provided access to clusters of NVIDIA A100 GPUs without requiring financial payment. Additional compute support came from SpellML, a cloud infrastructure company.
EleutherAI developed a new training framework called GPT-NeoX, built on top of NVIDIA's Megatron language model framework and Microsoft's DeepSpeed library. This GPU-based codebase was designed to scale to hundreds of billions of parameters and beyond, overcoming the limitations of the earlier TPU-based Mesh TensorFlow approach.
On February 10, 2022, EleutherAI released GPT-NeoX-20B, a 20-billion-parameter autoregressive language model trained on The Pile. At the time, it was the largest dense, publicly available language model. The model was trained on CoreWeave's NVIDIA A100 GPU cluster and was released under the Apache 2.0 open-source license. The accompanying paper, authored by Sid Black, Stella Biderman, and 15 other contributors, was published at the BigScience Workshop at ACL 2022.
GPT-NeoX-20B introduced several architectural innovations compared to earlier models in the series. It used rotary positional embeddings (RoPE) instead of learned positional encodings and computed attention and feed-forward layers in parallel rather than sequentially, yielding roughly 15% greater training throughput.
In early 2023, EleutherAI formalized its structure by incorporating as a non-profit research institute, the EleutherAI Institute. The organization announced that it would be led by Stella Biderman as Executive Director and Head of Research, Curtis Huebner as Head of Alignment, and Shivanshu Purohit as Head of Engineering.
The non-profit received funding and support from a range of backers, including Stability AI, Hugging Face, Lambda Labs, Canva, the Mozilla Foundation, Open Philanthropy, the Omidyar Network, and individual donor Nat Friedman (former CEO of GitHub). CoreWeave and Google TRC continued to provide compute resources.
Alongside the incorporation, EleutherAI announced a strategic shift in focus. While the organization would continue to release open-source models, it would place greater emphasis on interpretability, AI alignment, and scientific research. The non-profit structure allowed EleutherAI to hire full-time staff for the first time, eventually growing to approximately two dozen full-time and part-time researchers with about a dozen regular volunteers and external collaborators.
In April 2023, EleutherAI released the Pythia model suite, a collection of 16 language models ranging from 70 million to 12 billion parameters. Unlike the earlier GPT-Neo and GPT-J releases, which were primarily aimed at providing open alternatives to proprietary models, Pythia was designed from the ground up as a research tool for studying how language models learn and develop over the course of training.
The Pythia paper was published at ICML 2023, with Stella Biderman and Hailey Schoelkopf as lead authors. The suite has since become a standard resource for research in mechanistic interpretability, learning dynamics, and AI ethics.
By 2025, EleutherAI had accumulated over 70 million model downloads and published more than 130 papers in top venues including NeurIPS, ICML, ICLR, EMNLP, and Nature. The organization also released the Common Pile v0.1 in June 2025, an 8-terabyte dataset composed entirely of public domain and openly licensed text, developed in partnership with Poolside, Hugging Face, and the US Library of Congress.
| Person | Role | Notes |
|---|---|---|
| Connor Leahy | Co-founder | Reverse-engineered GPT-2 in 2019. Later co-founded Conjecture, an AI safety company, in March 2022. Active advocate for AI regulation and existential risk mitigation. |
| Sid Black | Co-founder | Lead author on the GPT-NeoX-20B paper. Later co-founded Conjecture alongside Connor Leahy and Gabriel Alfour. |
| Leo Gao | Co-founder | Primary architect of The Pile dataset and the LM Evaluation Harness. Later joined OpenAI as a researcher in 2021, where he continued work on alignment research. |
| Stella Biderman | Executive Director, Head of Research | Joined the project early and became a central figure. Holds a background in mathematics, computer science, and philosophy. Leads the non-profit institute since 2023. |
| Curtis Huebner | Head of Alignment | Directs EleutherAI's research on AI alignment and safety. |
| Shivanshu Purohit | Head of Engineering | Leads engineering efforts, contributed to the Pythia project and GPT-NeoX development. |
| Ben Wang | Core Contributor | Created the Mesh Transformer JAX library used to train GPT-J-6B. |
| Hailey Schoelkopf | Researcher | Co-lead author on the Pythia paper. |
EleutherAI has produced several families of open-source language models, each representing a step forward in scale, capability, and research utility.
GPT-Neo was EleutherAI's first model release and the organization's initial attempt to replicate the GPT-3 architecture in an open-source setting. The model family consisted of three sizes: 125M, 1.3B, and 2.7B parameters. All three variants were trained on The Pile using the Mesh TensorFlow library, which enabled distributed training across Google's TPU pods.
The GPT-Neo architecture closely followed GPT-3's design but incorporated local attention in alternating layers for improved efficiency. In the local attention layers, the window size was set to 256 tokens. The 2.7B model used 32 layers with a hidden dimension of 2,560 and 20 attention heads. The 125M model was trained for 300 billion tokens over 572,300 steps.
All GPT-Neo models were released under the MIT open-source license, making them freely available for both research and commercial use. At the time of release, GPT-Neo 2.7B was the largest publicly available transformer language model trained on a curated, diverse dataset.
GPT-J-6B represented a significant leap in scale for EleutherAI. The model contained 6 billion parameters and was trained on The Pile using Ben Wang's Mesh Transformer JAX framework on a TPU v3-256 pod. Training consumed 402 billion tokens over 383,500 steps.
Architecturally, GPT-J consisted of 28 transformer layers with a model dimension of 4,096 and a feedforward dimension of 16,384. The model dimension was split across 16 attention heads, each with a dimension of 256. GPT-J was one of the first large language models to use Rotary Position Embeddings (RoPE), which were applied to 64 dimensions of each attention head. The model used the same BPE tokenizer as GPT-2 and GPT-3, with a vocabulary of 50,257 tokens.
At the time of its release, GPT-J-6B was the largest publicly available GPT-3-style model. Benchmarks showed it performing comparably to the similarly sized GPT-3 Curie variant (6.7B parameters). GPT-J was released under the Apache 2.0 license.
GPT-NeoX-20B was EleutherAI's largest model and marked the organization's transition from TPU-based training to GPU-based training. The model contained 20 billion parameters and was trained on The Pile using EleutherAI's custom GPT-NeoX framework, which combined NVIDIA's Megatron with Microsoft's DeepSpeed.
The architecture featured 44 transformer layers with a hidden dimension of 6,144 and 64 attention heads. The feedforward intermediate dimension was 24,576, and the vocabulary size was expanded to 50,432 tokens. Like GPT-J, GPT-NeoX-20B used rotary positional embeddings, applied to the first 25% of each head's embedding dimensions. The model computed attention and feed-forward sub-layers in parallel rather than sequentially, a design choice that improved training throughput by approximately 15%.
GPT-NeoX-20B was trained on CoreWeave's cluster of NVIDIA A100-SXM4-40GB GPUs with a batch size of approximately 3.15 million tokens (1,538 sequences of 2,048 tokens each) for 150,000 steps. The model was released under the Apache 2.0 license and hosted on GooseAI, a managed inference service operated jointly by CoreWeave and Anlatan (the creators of NovelAI).
The Pythia suite represented a departure from EleutherAI's earlier work, which was primarily focused on pushing the boundaries of model scale. Instead, Pythia was designed as a controlled scientific resource for studying how language models learn and evolve during training.
The suite consisted of 16 models in eight sizes: 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B parameters. Each size was trained in two variants: one on the standard Pile dataset and one on a deduplicated version of The Pile. All models were trained on the exact same data in the exact same order, using a uniform batch size of 2 million tokens (2,097,152 tokens) for 143,000 steps, totaling approximately 300 billion tokens per model.
A distinguishing feature of Pythia was the release of 154 intermediate checkpoints for every model, saved at regular intervals throughout training. These checkpoints enabled researchers to study learning dynamics, memorization patterns, and the emergence of biases at different stages of training. The training setup was fully reproducible, and all results in the published paper were independently verified by at least one external laboratory.
The Pythia paper presented case studies demonstrating novel findings in memorization behavior, the effect of term frequency on few-shot arithmetic performance, and methods for reducing gender bias during training.
| Model | Parameters | Release Date | Training Data | Training Framework | Hardware | Training Tokens | License |
|---|---|---|---|---|---|---|---|
| GPT-Neo 125M | 125M | March 2021 | The Pile | Mesh TensorFlow | TPU | ~300B | MIT |
| GPT-Neo 1.3B | 1.3B | March 2021 | The Pile | Mesh TensorFlow | TPU | ~300B | MIT |
| GPT-Neo 2.7B | 2.7B | March 2021 | The Pile | Mesh TensorFlow | TPU | ~300B | MIT |
| GPT-J-6B | 6B | June 2021 | The Pile | Mesh Transformer JAX | TPU v3-256 | ~402B | Apache 2.0 |
| GPT-NeoX-20B | 20B | February 2022 | The Pile | GPT-NeoX (Megatron + DeepSpeed) | NVIDIA A100 GPUs | ~473B | Apache 2.0 |
| Pythia 70M | 70M | April 2023 | The Pile | GPT-NeoX | GPU | ~300B | Apache 2.0 |
| Pythia 160M | 160M | April 2023 | The Pile | GPT-NeoX | GPU | ~300B | Apache 2.0 |
| Pythia 410M | 410M | April 2023 | The Pile | GPT-NeoX | GPU | ~300B | Apache 2.0 |
| Pythia 1B | 1B | April 2023 | The Pile | GPT-NeoX | GPU | ~300B | Apache 2.0 |
| Pythia 1.4B | 1.4B | April 2023 | The Pile | GPT-NeoX | GPU | ~300B | Apache 2.0 |
| Pythia 2.8B | 2.8B | April 2023 | The Pile | GPT-NeoX | GPU | ~300B | Apache 2.0 |
| Pythia 6.9B | 6.9B | April 2023 | The Pile | GPT-NeoX | GPU | ~300B | Apache 2.0 |
| Pythia 12B | 12B | April 2023 | The Pile | GPT-NeoX | GPU | ~300B | Apache 2.0 |
The Pile is an 825 GiB diverse, open-source English text dataset created by EleutherAI specifically for training large language models. It was publicly released on December 31, 2020, and was primarily curated by Leo Gao and Stella Biderman. The corresponding paper, "The Pile: An 800GB Dataset of Diverse Text for Language Modeling," was submitted to arXiv in January 2021.
The Pile was designed to address a gap in the availability of high-quality, diverse training data for language model research. At the time, most large language models were trained on proprietary datasets or on Common Crawl-derived corpora that, while large, lacked diversity across knowledge domains.
The Pile is composed of 22 smaller, high-quality subsets spanning a wide range of domains. Many of these subsets were newly constructed for the project, while others were existing datasets that were cleaned and reformatted. The following table lists all 22 component datasets:
| Subset | Size (GiB) | Description |
|---|---|---|
| Pile-CC | 227.12 | Filtered subset of Common Crawl with improved extraction quality |
| PubMed Central | 90.27 | Full-text biomedical and life sciences research articles |
| Books3 | 100.96 | Collection of books (later removed due to copyright concerns) |
| ArXiv | ~56 | Academic preprints in physics, mathematics, computer science, and other fields |
| GitHub | 95.16 | Open-source code repositories |
| OpenWebText2 | 62.77 | Extension of the original OpenWebText dataset, web pages linked from Reddit |
| FreeLaw | 51.15 | Legal opinions from the Free Law Project |
| Stack Exchange | 32.20 | Questions and answers from the Stack Exchange network |
| USPTO Backgrounds | 22.90 | Patent application background sections from the US Patent and Trademark Office |
| PubMed Abstracts | 19.26 | Abstracts from biomedical literature |
| OpenSubtitles | 12.98 | Movie and television subtitles |
| Project Gutenberg (PG-19) | 10.88 | Public domain books |
| DM Mathematics | 7.75 | Algorithmically generated math problems from DeepMind |
| Wikipedia (en) | ~6.4 | English Wikipedia articles |
| BookCorpus2 | 6.30 | Extension of the original BookCorpus dataset |
| Ubuntu IRC | 5.52 | Chat logs from Ubuntu support IRC channels |
| EuroParl | 4.59 | European Parliament proceedings in English |
| HackerNews | 3.90 | Comments from the Hacker News technology forum |
| YouTube Subtitles | 3.73 | Subtitles from YouTube videos |
| PhilPapers | 2.38 | Philosophy papers and abstracts |
| NIH ExPORTER | 1.89 | Abstracts from NIH-funded research grants |
| Enron Emails | 0.88 | The Enron email corpus |
The dataset was stored in jsonlines format compressed with zstandard. Models trained on The Pile showed significant performance improvements over models trained on raw Common Crawl data or CC-100, particularly on specialized domains like scientific literature, legal text, and code.
The Pile attracted controversy over the inclusion of copyrighted material, particularly through the Books3 subset, which contained books compiled from Bibliotik. In 2024, a class action lawsuit was filed by authors seeking damages over the use of their copyrighted works. In response, EleutherAI eventually removed the Books3 component. The copyright issues surrounding The Pile contributed to EleutherAI's decision to create the Common Pile v0.1 in 2025, which contained only public domain and openly licensed content.
The Language Model Evaluation Harness (lm-evaluation-harness) is an open-source framework developed by EleutherAI for evaluating generative language models across a wide variety of benchmarks. Originally created by Leo Gao, the tool has evolved since 2020 into what is widely considered the standard evaluation framework for large language models in both academic and industry settings.
The Evaluation Harness provides a unified codebase that allows any causal language model to be tested on the same inputs, ensuring that results from different models are directly comparable. It supports evaluation with publicly available prompts for full reproducibility, configurable few-shot settings, and compatibility with models hosted on Hugging Face, OpenAI APIs, vLLM, and custom local endpoints.
The framework serves as the backend for Hugging Face's Open LLM Leaderboard, one of the most widely referenced benchmarks for comparing language model performance. It has been used in hundreds of published research papers and is employed internally by organizations including NVIDIA, Cohere, BigScience, BigCode, Nous Research, and MosaicML.
All tasks in the current version of the harness are defined through YAML configuration files. Together with the codebase commit hash, these configuration files can be shared to enable precise replication of any evaluation setup. The framework supports advanced features such as output post-processing, answer extraction, LoRA adapter evaluation via Hugging Face's PEFT library, and data-parallel inference for faster evaluation.
EleutherAI's ability to train large language models despite lacking the resources of major technology companies is one of the more notable aspects of its story. The organization relied on a combination of donated compute, grant programs, and strategic partnerships to fund its research.
In its earliest phase, EleutherAI's primary source of compute was the Google TPU Research Cloud (TRC) program, which provides free access to Google's TPU hardware for research projects that commit to publishing their results publicly. Connor Leahy had an existing TRC allocation from prior work, and this became the foundation for training the GPT-Neo models and GPT-J-6B. The TRC program was well-suited to EleutherAI's open-source mission, as the program's requirement to share results aligned perfectly with the group's goals.
As EleutherAI's models grew in size, TPU resources alone became insufficient. In early 2021, the group established a partnership with CoreWeave, a cloud computing company that provided access to clusters of NVIDIA GPUs. This partnership was structured as a donation of compute resources rather than a financial transaction. CoreWeave's infrastructure powered the training of GPT-NeoX-20B and later served as a platform for ongoing experiments. Additional GPU support came from Stability AI, which provided access to a Slurm cluster.
When EleutherAI incorporated as a non-profit in 2023, it received grants and donations from Stability AI, Hugging Face, Lambda Labs, Canva, the Mozilla Foundation, Open Philanthropy, the Omidyar Network, and Nat Friedman. These funds supported hiring full-time staff and expanding research operations.
EleutherAI and Stability AI have had a close but informal relationship. Stability AI's founder, Emad Mostaque, began supporting EleutherAI during its early days and later provided both financial donations and compute resources. Stability AI is listed as a donor on EleutherAI's website, and the two organizations (along with LAION) collaborated on the development of Stable Diffusion. However, there is no formal organizational affiliation between the two entities, and Stability AI does not hold any intellectual property rights over EleutherAI's models.
Two of EleutherAI's three co-founders, Connor Leahy and Sid Black, went on to co-found Conjecture in March 2022, an AI safety research company focused on the alignment problem. Conjecture was directly born out of the founders' experiences at EleutherAI, which had deepened their understanding of the capabilities and risks of large language models. Leahy has since become a prominent voice calling for regulation of frontier AI development, including proposals for a moratorium on large-scale training runs.
Leo Gao, the third EleutherAI co-founder, joined OpenAI as a researcher in 2021. His prior work at EleutherAI on The Pile, the LM Evaluation Harness, and alignment research informed his continued contributions to the field. Despite his move to OpenAI, Gao has continued to participate in alignment discussions within the EleutherAI community.
Many EleutherAI members participated in the BigScience research workshop, a large international collaboration coordinated by Hugging Face that produced BLOOM, a 176-billion-parameter multilingual language model released in July 2022. EleutherAI contributors played roles in the design, development, and evaluation of BLOOM and the related mT0 model. Before BigScience convened, EleutherAI was the only non-corporate entity outside China actively developing large language models.
Beyond its own models, EleutherAI members have contributed to a range of community AI projects, including VQGAN-CLIP (AI art generation), Stable Diffusion (text-to-image generation), and OpenFold (protein structure prediction). The organization's Discord server has served as a hub for open-source AI research, with ongoing discussions spanning topics from mechanistic interpretability to biological machine learning.
Since its pivot in 2023, EleutherAI has made mechanistic interpretability a primary research focus. The Pythia model suite was specifically designed to support interpretability research by providing consistent training conditions and intermediate checkpoints.
In July 2024, EleutherAI released an open-source pipeline for generating and evaluating natural-language explanations of sparse autoencoder (SAE) features using large language models. The organization has also published research on whether interpretability methods designed for transformer architectures transfer effectively to recurrent models like Mamba and RWKV.
In January 2025, EleutherAI researchers co-authored "Open Problems in Mechanistic Interpretability," a landmark paper bringing together 29 researchers from 18 organizations to formalize the goals and open questions in the field. In March 2025, the organization launched the "Interpreting Across Time" project, which studies how model internals evolve during training to identify potential interventions for shaping model behavior.
EleutherAI maintains an active alignment research program led by Curtis Huebner. In February 2025, the organization launched Alignment-MineTest, a project that uses the open-source Minetest voxel game engine to study alignment properties of reinforcement learning agents, with a focus on corrigibility and misgeneralization.
Through its Polyglot project, EleutherAI has extended its work to non-English languages, developing and releasing multilingual model variants.
EleutherAI's influence on the broader AI ecosystem extends well beyond the models and tools it has directly produced. The organization helped establish the principle that large language models could and should be developed openly, at a time when the field was trending toward closed, proprietary systems.
The GPT-Neo and GPT-J model releases in 2021 are widely credited with sparking a wave of open-source AI development. These models demonstrated that meaningful language modeling capabilities were achievable outside the confines of major technology companies, inspiring subsequent open-source efforts by organizations including Meta (with LLaMA), Mistral AI, and the broader Hugging Face community. As one industry analysis noted, EleutherAI's models "fueled an entirely new wave of startups."
The Pile became one of the most widely used training datasets in the field, adopted by researchers and companies well beyond EleutherAI's own projects. The LM Evaluation Harness established a common standard for comparing language model performance, helping to bring rigor and reproducibility to an area that had previously lacked consistent evaluation practices.
EleutherAI also served as a training ground for AI researchers. Several members went on to take prominent positions at leading AI organizations, and the collaborative, open-science culture fostered by the Discord community influenced how other groups approached open-source AI development.
EleutherAI operates as a registered non-profit research institute (since early 2023). The organization maintains approximately two dozen full-time and part-time research staff, along with roughly a dozen regular volunteers and external collaborators. Day-to-day coordination continues to take place on the organization's public Discord server, maintaining the open, community-driven culture from its origins.
The non-profit is governed by its leadership team, with Stella Biderman serving as Executive Director and Head of Research, Curtis Huebner as Head of Alignment, and Shivanshu Purohit as Head of Engineering. This structure enables EleutherAI to accept grants, hire staff, and enter formal partnerships while preserving its commitment to open research.