Arc Institute
Last reviewed
Sources
13 citations
Review status
Source-backed
Revision
v1 · 1,736 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
13 citations
Review status
Source-backed
Revision
v1 · 1,736 words
Add missing citations, update stale details, or suggest a clearer explanation.
The Arc Institute is an independent, nonprofit biomedical research organization headquartered in Palo Alto, California. Founded in 2021, it pursues long-horizon basic science aimed at understanding and curing complex human diseases, operating in partnership with Stanford University, the University of California, Berkeley, and the University of California, San Francisco (UCSF) [1][2]. The institute is notable within artificial intelligence for its work at the intersection of machine learning and biology, particularly the Evo and Evo 2 DNA foundation models and the State virtual cell model. It was co-founded by bioengineer Patrick Hsu, biochemist Silvana Konermann, and Stripe chief executive Patrick Collison, and launched with roughly $650 million in committed philanthropic funding [3][4].
Arc describes its mission as accelerating discovery, uncovering the root causes of complex diseases such as cancer, neurodegeneration, and immune dysfunction, and closing the gap between scientific breakthroughs and real-world impact [1]. Rather than operating as a conventional university department or a traditional government-funded laboratory, Arc is structured as a standalone nonprofit research institute that collaborates closely with its three Bay Area university partners, allowing affiliated faculty to maintain academic appointments while conducting research at Arc [2][5].
The institute organizes its work around core laboratories led by principal investigators, technology centers that provide shared experimental and computational infrastructure, and institute-wide initiatives. By 2025 it employed more than 300 staff, internally referred to as "Arconauts" [1]. Its research spans genome engineering, neuroscience, immunology, and a growing body of computational and AI-driven work in genomics and cell biology.
Arc Institute was launched publicly on December 15, 2021, in collaboration with Stanford, UC Berkeley, and UCSF [4][5]. Its creation grew in part out of the founders' experience with Fast Grants, a rapid science-funding program operated during the COVID-19 pandemic, which highlighted inefficiencies in the conventional grant system [3].
The institute's defining feature is its "core funding" model. Instead of requiring scientists to write and renew federal grants on short cycles, Arc provides multi-year institutional support intended to free researchers to pursue ambitious, high-risk, high-reward projects without continual fundraising pressure [2][3]. Funding is provided through several tiers, summarized below.
| Program | Support | Duration |
|---|---|---|
| Core Investigators | Full institutional funding for a lab (up to roughly 20 people) | About 8 years, renewable |
| Innovation Investigators | Approximately $1 million | 5 years |
| Ignite Awards | Approximately $100,000 | 1 year |
Core Investigators run their own laboratories at Arc, while the Innovation Investigator and Ignite programs extend support to faculty at the partner universities [1][2]. This model is part of a broader movement of philanthropically funded "focused research organizations" and independent institutes that aim to complement, rather than replace, traditional academic and government science.
The institute was co-founded by three people who continue to shape its scientific and organizational direction:
Arc's board and executive team have expanded as the institute has grown. Reported board and leadership additions include technology figures such as Nat Friedman and Reid Hoffman, and the institute appointed a chief technology officer in 2024 and named Megan van Overbeek as chief scientific officer in 2026 [3].
Arc has become one of the more visible nonprofit players in AI for science, with a particular focus on biological foundation models that learn directly from genomic and cellular data.
In 2024, researchers at Arc and Stanford, working with collaborators including Brian Hie and Patrick Hsu, released Evo, a genomic foundation model trained at single-nucleotide resolution [8][9]. Evo is a 7-billion-parameter model with a context length of about 131,000 tokens, built on the StripedHyena architecture, a signal-processing-based design intended to improve efficiency over the standard Transformer at long sequence lengths [8][9].
Evo was trained on roughly 2.7 million prokaryotic and bacteriophage genomes, released as an open dataset called OpenGenome containing about 300 billion tokens [8]. The model demonstrated zero-shot prediction of molecular function across DNA, RNA, and protein modalities, competitive with or exceeding specialized models, and was able to generate experimentally validated CRISPR-Cas complexes and transposable systems, an early demonstration of protein-RNA and protein-DNA co-design with a language model [9]. The work was published in the journal Science in November 2024 and was named among The New York Times "Good Tech Awards" for that year [3][9].
In February 2025, Arc and NVIDIA, together with collaborators from Stanford, UC Berkeley, UCSF, the University of Washington, and others, released Evo 2, a substantially larger DNA foundation model [10][11]. Evo 2 was released openly on February 19, 2025, with code on GitHub and model weights made publicly available, and was integrated into NVIDIA's BioNeMo platform [10][11].
Evo 2 was trained on approximately 9.3 trillion nucleotides drawn from more than 100,000 species and over 128,000 whole genomes spanning all domains of life, including bacteria, archaea, phages, plants, animals, and humans, roughly 30 times the training data of the original Evo [10][12]. The largest version contains 40 billion parameters (a smaller 7-billion-parameter version was also released), uses the StripedHyena 2 architecture, and can process context windows of up to about 1 million nucleotides [10][12]. Training was carried out on more than 2,000 NVIDIA H100 GPUs via NVIDIA DGX Cloud [11].
Evo 2 can both predict and design genetic sequences. Reported capabilities include identifying disease-associated mutations, with the model achieving over 90% accuracy in classifying BRCA1 variants as benign or potentially pathogenic, and generating genome-scale sequences [10][11]. In a widely noted demonstration, researchers used Evo 2 to design bacteriophage genomes; of 285 tested designs for a small, 11-gene organism, 16 successfully propagated and inhibited the growth of target bacterial strains, described as among the first functional AI-designed genomes of their kind [12]. The accompanying research paper was published in Nature on March 4, 2026 [12]. By that point, Arc reported that Evo 2 had been downloaded more than 88,000 times on GitHub and that its 7-billion- and 40-billion-parameter variants had together received millions of API requests through Hugging Face [12].
Beyond genomic sequence modeling, Arc has invested in "virtual cell" models that aim to predict how cells respond to drugs, signaling molecules, and genetic perturbations. In 2025 the institute released State, described as its first virtual cell model and one of the largest single-cell perturbation models to date [13]. State predicts shifts in gene expression given a starting transcriptome and a specified perturbation, and was trained on observational data from roughly 170 million cells together with perturbational data from more than 100 million cells across dozens of cell lines, drawing on the Arc Virtual Cell Atlas [13]. To spur progress in the field, Arc also organized a Virtual Cell Challenge in 2025, which drew thousands of registered participants from more than 100 countries [13].
Arc was launched with approximately $650 million in committed philanthropic funding, an unusually large endowment-style commitment for a new independent research institute [3][4]. Reported founding donors include Ethereum creator Vitalik Buterin, Stripe co-founders Patrick and John Collison, Asana and Facebook co-founder Dustin Moskovitz and Cari Tuna, angel investor Ron Conway, and other technology entrepreneurs and investors [3]. The institute operates on an annual budget reported at roughly $80 million as of the mid-2020s [3].
Arc's funding base is philanthropic rather than commercial, and it does not raise venture capital, though it has formed research and technology partnerships with companies. Notable collaborations include its work with NVIDIA on Evo 2, announced in early 2025, and a 2025 partnership with 10x Genomics and Ultima Genomics to build large-scale single-cell datasets for virtual cell models [3][13].
The Arc Institute is frequently cited as a leading example of a new kind of scientific organization: a well-funded, independent nonprofit that combines long-term institutional support for basic biology with serious investment in machine learning. Its core funding model is part of a broader experiment in research funding intended to give scientists greater freedom to pursue ambitious, long-horizon projects [2][3].
In AI specifically, Arc has helped establish biological foundation models as a major research direction. Evo and Evo 2 are among the most prominent open DNA foundation models, and Arc's collaboration with NVIDIA placed it at the center of efforts to scale genomic AI [10][11]. Together with its virtual cell work, this positions the institute as one of the most influential nonprofit organizations operating at the intersection of artificial intelligence and biology, alongside efforts such as the Chan Zuckerberg Initiative and academic foundation-model groups. The institute is distinct from unrelated entities that share the "Arc" name, such as the ARC-AGI benchmark and the Arc web browser.