The Allen Institute for AI (abbreviated Ai2) is a 501(c)(3) non-profit research institute based in Seattle, Washington, dedicated to conducting high-impact artificial intelligence research and engineering for the common good. Founded in 2014 by Microsoft co-founder Paul Allen, Ai2 has grown into one of the most influential AI research organizations in the world, publishing hundreds of peer-reviewed papers and releasing a series of fully open language models, datasets, and tools that have shaped the trajectory of open-source AI development.
Ai2 distinguishes itself from other AI research labs by its commitment to full openness: releasing not only model weights but also training data, training code, evaluation code, and intermediate checkpoints. This philosophy stands in contrast to organizations that release model weights alone while keeping training data and procedures proprietary. As of 2026, Ai2 employs over 250 researchers, engineers, and support staff across its Seattle headquarters and a satellite office in Tel Aviv, Israel.
Paul Allen, the co-founder of Microsoft, had long been interested in advancing AI research. In September 2013, he appointed Oren Etzioni, a professor of computer science at the University of Washington, to serve as the founding CEO and to lead the research direction of a new institute. Etzioni took a leave of absence from the university in January 2014, and the Allen Institute for Artificial Intelligence formally began operations that year.
Allen's vision for Ai2 was rooted in addressing what he saw as a critical gap in AI development. He observed that AI systems lacked what most 10-year-olds possess: ordinary common sense. The institute's early research programs reflected this concern, with projects focused on machine reading, natural language understanding, and commonsense reasoning.
During its first few years, Ai2 organized its research around several flagship projects:
In 2017, Ai2 released the AllenNLP library, an open-source natural language processing research platform built on PyTorch. AllenNLP provided high-level abstractions for building NLP models and quickly became widely adopted in the research community, accumulating citations in hundreds of academic publications and use by over 800 open-source projects on GitHub.
On February 28, 2018, Paul Allen announced a commitment of $125 million over three years to support Ai2 and to launch Project Alexandria, a new research initiative focused on common sense AI. The project aimed to integrate knowledge developed in machine reading (Project Aristo), natural language understanding (Project Euclid), and computer vision (Project Plato) to create a unified and extensive knowledge source.
Project Alexandria set out to develop standard metrics for measuring the commonsense abilities of AI systems, novel crowdsourcing methods to acquire commonsense knowledge from people at scale, and applications that use common sense to improve performance across practical AI challenges.
Paul Allen passed away on October 15, 2018. His estate has continued to fund AI2 through the Paul Allen Trust and related philanthropic vehicles. In 2025, the Paul Allen estate established the Fund for Science and Technology with an initial $3.1 billion endowment, pledging at least $500 million over four years to accelerate progress in bioscience, the environment, and AI for good.
Oren Etzioni served as CEO of Ai2 from its inception through late 2022, a tenure of approximately nine years. Under his leadership, the institute grew from zero to over 200 team members. Ai2 researchers published nearly 700 papers in top venues such as AAAI, ACL, CVPR, NeurIPS, and ICLR, with 24 of those papers earning special-recognition awards.
After Etzioni's departure, Peter Clark, a founding member of Ai2, served as interim CEO while the board conducted a search for a permanent successor. On June 20, 2023, Ai2 announced that Ali Farhadi, a professor at the University of Washington's Paul G. Allen School of Computer Science and Engineering, would become the next CEO, effective July 31, 2023.
Farhadi led Ai2 through a period of rapid growth in its open-source model releases, including the OLMo and Tulu model families. However, on March 12, 2026, it was announced that Farhadi was stepping down as CEO after approximately two and a half years in the role. Reports indicated that Farhadi wished to pursue frontier-scale AI research, while the board cited the financial realities of competing against tech giants at the largest scale of AI model development as a nonprofit. Chief Operating Officer Sophie Lebrecht also departed. Peter Clark returned to the interim CEO role as the board began searching for a new permanent leader.
Ai2's stated mission is to build breakthrough AI to solve the world's biggest problems through foundational research and innovation that delivers real-world impact. The organization prioritizes openness, scientific rigor, measurable impact, and collaborative partnerships.
A central tenet of Ai2's philosophy is full openness. While many organizations use the term "open" to describe releasing model weights, Ai2 defines openness as providing everything necessary for others to understand, reproduce, and modify the work. As Ali Farhadi articulated, the spirit of openness requires that others can understand a work "to the extent that [they] could change it to do [their] work." Anything that does not meet this standard, in Ai2's view, does not qualify as truly open source.
This philosophy is reflected in Ai2's model releases, which typically include model weights, full training data, all training code, training logs and metrics (such as Weights & Biases logs), evaluation code, and hundreds or thousands of intermediate checkpoints. This level of transparency exceeds what is offered by most other organizations, including Meta's Llama models (which release weights but not training data) and most commercial labs.
Ai2's models meet the Open Source Initiative's definition of open source AI, and the institute has been a vocal advocate for distinguishing between "open weight" and genuinely open-source AI.
Semantic Scholar is a free, AI-powered academic search engine and research tool developed by Ai2. Publicly launched in November 2015, it initially focused on computer science papers. By 2017, it had expanded to include biomedical literature, and by 2018 the corpus exceeded 40 million papers. As of 2025, Semantic Scholar indexes over 200 million academic papers sourced from publisher partnerships, data providers, and web crawls, with over 225 million paper records and 2.8 billion citation edges in its academic graph.
Semantic Scholar uses a combination of machine learning, natural language processing, and machine vision to go beyond traditional keyword-based search. Key features include:
| Feature | Description |
|---|---|
| TLDR Summaries | Automatic one-sentence summaries generated by AI to help researchers quickly assess a paper's relevance |
| Semantic Reader | An augmented PDF reader that provides rich contextual information, inline citations, and term definitions |
| Research Feeds | A personalized recommender system that learns user preferences and surfaces relevant new papers |
| Citation Analytics | Graphical representations of citation velocity and author influence scores for quality pre-assessment |
| Highly Influential Citations | Highlights references that meaningfully shaped a paper, distinguishing them from perfunctory citations |
| ScholarQA | An experimental question-answering tool that synthesizes answers from multiple documents with citations |
The Semantic Scholar Academic Graph (S2AG) dataset and APIs are freely available for researchers, and the Semantic Scholar Open Research Corpus (S2ORC) provides a general-purpose corpus for NLP and text mining research over scientific papers.
OLMo (Open Language Model) is Ai2's family of fully open large language models. The OLMo project represents the institute's flagship effort to make every aspect of language model development transparent and reproducible.
Ai2 released the first OLMo models on February 1, 2024, making them available on Hugging Face and GitHub. The initial release included four variants at the 7 billion parameter scale (corresponding to different architectures, optimizers, and training hardware) and one model at the 1 billion parameter scale, all trained on at least 2 trillion tokens from the Dolma dataset.
What set OLMo apart from previous model releases was its unprecedented openness. Unlike most prior efforts that had only released model weights and inference code, OLMo was released alongside:
The accompanying paper, "OLMo: Accelerating the Science of Language Models" (arXiv:2402.00838), was published in February 2024.
Ai2 released OLMo 2 in late 2024, with 7B and 13B variants announced in November and a 32B variant following shortly after. A 1B variant was added later. Key training details:
| Model | Parameters | Tokens Trained | Training Duration |
|---|---|---|---|
| OLMo 2 1B | 1 billion | 4 trillion | 1 epoch |
| OLMo 2 7B | 7 billion | 4 trillion | 1 epoch |
| OLMo 2 13B | 13 billion | 5 trillion | 1.3 epochs |
| OLMo 2 32B | 32 billion | 6 trillion | 1.5 epochs |
OLMo 2 7B outperformed Llama 3.1 8B on standard benchmarks, and OLMo 2 13B outperformed Qwen 2.5 7B despite lower total training compute. OLMo 2 32B became the first fully open model (with all data, code, weights, and details freely available) to outperform GPT-3.5 Turbo and GPT-4o mini on a suite of popular multi-skill academic benchmarks.
The OLMo 2 paper, titled "2 OLMo 2 Furious," was published on arXiv in January 2025 (arXiv:2501.00656).
Announced on November 20, 2025, OLMo 3 represented a further leap in capability and openness. The release included multiple variants at 7B and 32B scales:
OLMo 3 supported context lengths of up to 65,000 tokens and was trained on Dolma 3, a dataset of nearly six trillion tokens. The base model was reported to be 2.5 times more efficient to train than Meta's Llama 3.1 (based on GPU-hours per token). All models were released under the Apache 2.0 license.
The OLMo 3 Think 32B variant was the first fully open 32B reasoning model that generates explicit chain-of-thought content.
OLMo 3.1 extended the reinforcement learning training from OLMo 3. The OLMo 3.1 Think 32B model resulted from resuming the best RL training run for an additional 21 days on 224 GPUs with extra epochs over the Dolci-Think-RL dataset. This yielded improvements of over 5 points on AIME, over 4 points on ZebraLogic and IFEval, and over 20 points on IFBench compared to OLMo 3.
Tulu is Ai2's family of instruction-tuned language models, designed to transform base language models into helpful assistants through a carefully designed post-training pipeline.
Tulu 2 was released in 7B and 70B parameter variants, built on top of Llama 2 base models. It included both supervised fine-tuned (SFT) and direct preference optimization (DPO) versions. Tulu 2 demonstrated that careful data curation and training methodology could produce strong instruction-following models from open base models.
Tulu 3 marked a significant advance in open post-training methodology. Initially released with 8B and 70B parameter variants built on Llama 3 base models, and later expanded with a 405 billion parameter version, Tulu 3 introduced a four-stage post-training recipe:
The RLVR innovation was a key differentiator. Rather than relying solely on human preference judgments, Tulu 3 used verifiable outcomes to provide training signal, producing more reliable improvements on tasks with objectively correct answers. The Tulu 3 405B model outperformed DeepSeek V3 and GPT-4o on certain benchmarks.
All Tulu training code is available via the open-instruct repository on GitHub.
Dolma is an open dataset of 3 trillion tokens created as the primary training corpus for OLMo. Built from approximately 200 terabytes of raw text curated down to about 11 terabytes of processed data, Dolma encompasses over 4 billion documents from diverse sources:
| Source | Type |
|---|---|
| Common Crawl | Web pages from multiple snapshots |
| C4 | Curated web text |
| The Stack | Permissively licensed code from GitHub |
| peS2o | Scientific manuscripts from Semantic Scholar |
| Project Gutenberg | Public domain books |
| Social media discussions | |
| Wikipedia and Wikibooks | Encyclopedic material |
Dolma was originally released in August 2023 under the AI2 ImpACT license and was re-licensed under the Open Data Commons Attribution License (ODC-By) on April 15, 2024. The Dolma Toolkit provides tools for processing and curating large-scale datasets, supporting deduplication, quality filtering, and data inspection.
Dolma 3, released alongside OLMo 3, expanded the dataset to nearly six trillion tokens with improved quality filtering and deduplication.
Molmo is Ai2's family of open multimodal vision-language models. Released in 2024, Molmo was trained on PixMo, a novel dataset of over 712,000 images and approximately 1.3 million captions collected from human annotators using speech-based descriptions (annotators were asked to describe every detail of an image for 60 to 90 seconds).
The Molmo family included models at multiple scales, with the 72B variant outperforming proprietary models including Claude 3.5 Sonnet, Gemini 1.5 Pro, and Gemini 1.5 Flash, trailing only GPT-4o in its class.
Molmo 2, released subsequently, added video understanding capabilities, allowing the model to analyze videos and multiple images simultaneously. Molmo 2 (8B) showed consistent gains over both Molmo (7B) and Molmo (72B) on most core benchmarks, with especially large improvements on grounding and counting tasks.
WildBench is an automated evaluation framework developed by Ai2 for benchmarking large language models using challenging real-world user queries. The benchmark consists of 1,024 tasks carefully selected from over one million human-chatbot conversation logs collected through the WildChat project.
The framework employs two primary evaluation metrics:
WildBench results demonstrate a strong correlation with human-voted Elo ratings from Chatbot Arena on difficult tasks, with WB-Reward achieving a Pearson correlation of 0.98 with top-ranking models. The paper was published at ICLR 2025.
WildChat is a large-scale, multilingual corpus of real user-chatbot interaction logs. The dataset comprises over one million conversations (encompassing more than 2.5 million interaction turns) between anonymous users and ChatGPT models (GPT-3.5-Turbo and GPT-4). Collected between April 2023 and May 2024 via publicly accessible chatbot deployments on Hugging Face Spaces, WildChat covers 68 languages and includes demographic and geographic metadata from approximately 204,736 unique users.
OLMES (Open Language Modeling Evaluation System) is Ai2's standardized evaluation framework consisting of a suite of 20 benchmarks designed to assess models' core capabilities, including knowledge recall, commonsense reasoning, general reasoning, and mathematical reasoning. OLMES provides consistent and reproducible evaluation methodology for the OLMo model family and other language models.
Beyond language models and NLP research, Ai2 has invested significantly in AI applications for environmental conservation:
EarthRanger is a real-time conservation monitoring platform that aggregates data from satellites, camera traps, acoustic sensors, and GPS-collared animals (including elephants, rhinos, and wild dogs). It overlays wildlife data on maps, sends alerts for suspicious activity, and helps conservation authorities manage protected areas. As of 2025, EarthRanger is deployed at 550 sites across 70 countries.
Skylight is a maritime monitoring platform that combats illegal fishing by analyzing over 10 million square kilometers of ocean data daily. Using computer vision and machine learning, Skylight detects vessels that attempt to evade monitoring systems and can send alerts to authorities within 15 minutes of detecting potentially illicit activity. The platform is offered free of charge to governments and conservation organizations.
Grover was a model developed by Ai2 researchers for detecting neural fake news. The key insight was that to detect AI-generated text reliably, the detector should itself be trained to generate similar text. Grover achieved 92% accuracy in distinguishing between human-written and AI-generated news, compared to 73% for the best previous discriminator. It could also classify 96% of GPT-2's outputs as machine-written without seeing any GPT-2 examples during training. The model was released in three variants: Grover-Base, Grover-Large, and Grover-Mega.
Macaw (Multi-angle c(q)uestion answering) was a general-purpose question-answering model built on top of T5 via the UnifiedQA framework. Available in 11B, 3B, and large variants, Macaw was trained in a "multi-angle" fashion, meaning it could handle flexible combinations of inputs and outputs (for example, generating a question from an answer, or producing explanations alongside answers). Macaw outperformed GPT-3 by over 10 percentage points on Challenge300, a suite of 300 challenge questions, despite being an order of magnitude smaller (11 billion vs. 175 billion parameters).
AllenNLP was an open-source NLP research library built on PyTorch, launched in 2017. It provided a flexible data API for intelligent batching and padding, high-level abstractions for common text operations, and a modular experiment framework. The library included reference implementations of models for tasks such as named entity recognition, semantic role labeling, question answering, textual entailment, constituency parsing, and text classification. AllenNLP was widely adopted in the research community, cited in hundreds of academic publications.
The Ai2 Incubator is a startup studio and venture fund that was spun out of the Allen Institute for AI in 2022. It supports early-stage AI companies at the intersection of artificial intelligence and real-world applications. Oren Etzioni, the founding CEO of Ai2, serves as the founder and technical director of the incubator.
Over 50 companies have graduated from the Ai2 Incubator. Approximately 90% of incubated companies have gone on to raise venture funding, and roughly 24% of graduates have been acquired by companies including Apple, DocuSign, Thomson Reuters, and Baidu. Notable alumni include:
| Company | Focus | Outcome |
|---|---|---|
| Xnor.ai | Edge AI, efficient on-device inference | Acquired by Apple for approximately $200 million (January 2020) |
| Lexion | AI-powered contract management | Acquired by DocuSign |
| Ozette | Biotech data analysis | Raised venture funding |
| Yoodli | AI roleplay coaching | Raised venture funding |
| Vercept | Automated desktop workflows | Co-founded by Oren Etzioni |
| Casium | Immigration processing | Raised venture funding |
The Ai2 Incubator has raised multiple funds to support its portfolio companies. In 2023, it closed a $30 million fund. In 2024, it secured $200 million in compute resources to provide to its startups. In 2025, it launched an $80 million third fund to support approximately 70 new tech ventures over the following four years. Madrona Venture Group is a key backer of the incubator.
| Project / Model | Year | Type | Key Details |
|---|---|---|---|
| Semantic Scholar | 2015 | Academic search engine | 200M+ papers indexed; AI-powered features including TLDR, Semantic Reader, Research Feeds |
| AllenNLP | 2017 | NLP library | Open-source PyTorch library for NLP research; widely adopted in academia |
| Grover | 2019 | Fake news detection | 92% detection accuracy; trained to both generate and detect neural fake news |
| Macaw | 2021 | Question answering | Multi-angle QA model; outperformed GPT-3 on Challenge300 with 11B parameters |
| Dolma | 2023 | Training dataset | 3T tokens from web, code, books, scientific papers; open under ODC-By |
| OLMo 1 | Feb 2024 | Language model | 1B and 7B parameters; fully open with data, code, checkpoints |
| WildChat | 2024 | Conversation dataset | 1M+ real user-ChatGPT conversations across 68 languages |
| Molmo | 2024 | Vision-language model | Open multimodal models trained on PixMo; 72B outperformed Claude 3.5 Sonnet |
| Tulu 3 | Nov 2024 | Instruction-tuned model | 8B, 70B, 405B variants; introduced RLVR post-training method |
| OLMo 2 | Nov-Dec 2024 | Language model | 1B, 7B, 13B, 32B variants; 32B first fully open model to beat GPT-3.5 Turbo |
| WildBench | 2024 | Evaluation benchmark | 1,024 real-world tasks; 0.98 Pearson correlation with Chatbot Arena Elo; published at ICLR 2025 |
| EarthRanger | Ongoing | Conservation platform | Real-time wildlife monitoring at 550 sites in 70 countries |
| Skylight | Ongoing | Maritime monitoring | Illegal fishing detection across 10M+ sq km of ocean daily |
| OLMo 3 | Nov 2025 | Language model | 7B and 32B; first fully open 32B reasoning model; 65K context; Apache 2.0 |
| OLMo 3.1 | Dec 2025 | Language model | Extended RL training; improved reasoning on AIME, ZebraLogic, IFEval |
| Molmo 2 | 2025 | Vision-language model | Added video understanding and multi-image analysis |
Ai2 has developed and released a range of open-source tools to support its model training and evaluation pipeline:
Ai2 occupies a distinctive position in the AI research landscape. As a non-profit, it competes for talent and attention with well-funded corporate labs such as OpenAI, Google DeepMind, Anthropic, and Meta AI, as well as with other open-source efforts from Mistral AI, Stability AI, and the BigScience consortium.
Ai2's approach differs from Meta's in a fundamental way. While Meta's Llama models are often described as "open," they release model weights without providing the full training data or detailed training procedures. Ai2 has argued that this distinction matters, because true scientific reproducibility requires access to the entire pipeline. OLMo's releases are designed so that any researcher with sufficient compute could reproduce the results from scratch.
The financial challenges of this approach are significant. Training frontier-scale models requires hundreds of millions of dollars in compute costs, and Ai2's non-profit status limits its ability to compete at the very largest scales. The departure of CEO Ali Farhadi in early 2026 underscored these tensions, as the board acknowledged the difficulty of matching the spending of for-profit tech giants.
Despite these constraints, Ai2's models have consistently demonstrated that careful data curation, training methodology, and post-training techniques can close much of the gap with larger, more expensively trained models. OLMo 2 32B's ability to outperform GPT-3.5 Turbo and GPT-4o mini, and Tulu 3's competitive performance against models many times its size, illustrate the value of Ai2's research-driven approach.
Ai2 is a 501(c)(3) non-profit organization. Its primary funding has historically come from the Paul Allen Trust and related entities associated with Paul Allen's estate. Additional funding comes from grants, philanthropic donations, government agencies, and partnerships with industry leaders and academic institutions.
The board of directors is chaired by Bill Hilf and includes Jody Allen (Paul Allen's sister and trustee of his estate), along with Ana Mari Cauce, Steve Hall, and Ed Lazowska. The institute's scientific advisory board includes notable figures such as Eric Horvitz, Tom Mitchell, and Adam Cheyer.
Ai2 maintains its headquarters at 3800 Latona Ave NE in Seattle's University District and has expanded its footprint with additional office space in the Northlake Commons building. The Tel Aviv, Israel office focuses on natural language processing, text mining, and advanced search technologies.