Basecamp Research

AI Companies Drug Discovery

11 min read

Updated Jun 8, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 8, 2026

Fact-checked

In review queue

Sources

16 citations

Revision

v1 · 2,161 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Basecamp Research is a London-based artificial intelligence and biotechnology company that has assembled what it describes as the world's largest and most diverse database of biological sequences, sampled directly from nature, and uses that proprietary data to train protein design and biological foundation models. Founded in 2019 by Glen Gowers and Oliver Vince, the company collects DNA from extreme and biodiverse environments worldwide under benefit-sharing agreements aligned with the Nagoya Protocol, and argues that the diversity of its data, rather than the size of its models alone, is the decisive advantage for AI in biology. ^[1]^[2]^[3]

Basecamp's central thesis is that public sequence repositories such as UniProt are too narrow to support frontier generative models: by the company's accounting, roughly 70 percent of existing public sequence data derives from only about ten heavily studied species. Its response has been to build a proprietary biological dataset, branded BaseData and structured as a knowledge graph called BaseGraph, that it says is more than ten times larger than all public databases combined and that catalogues over one million previously unknown species. The company has used this resource to launch a protein structure model (BaseFold) and a family of generative biology models (EDEN), and to enter partnerships including a genetic medicine collaboration with the laboratory of David R. Liu at the Broad Institute of MIT and Harvard. ^[1]^[3]^[4]^[5]

Overview

Basecamp Research operates at the intersection of biodiversity discovery, data infrastructure, and machine learning. Its business model treats curated, contextualized, and ethically sourced biological data as a defensible asset, or "data moat," that can be used both to train the company's own models and to supply data and tools to pharmaceutical, industrial biotechnology, and AI partners. Chief executive Glen Gowers has framed the company's position by contrasting it with conventional AI drug discovery, arguing that "AI is not magic. It's a pattern recognition tool that has creativity, but within the limits of what it's already seen," and that better data, not just larger models, expands those limits. ^[2]^[6]

The company is headquartered in London and has built a presence in the Boston area to support its U.S. partnerships. Beyond the two co-founders, its leadership includes chief technology officer Phil Lorenz and chief commercial officer Anupama Hoey, a biopharma executive hired in 2024 who had previously held roles at companies including Sutro Biopharma and Second Genome. ^[2]^[4]^[5]

Founding

Basecamp Research traces its origins to an off-grid DNA sequencing expedition led by Gowers and Vince, who met as researchers at Imperial College London. In 2019 the pair ran a fully self-contained, solar-powered sequencing operation on the Vatnajokull icecap in Iceland, sequencing microbial DNA in the field without grid power or internet connectivity. They found that roughly two-thirds of the organisms they sampled had never been recorded in any database, which convinced them that the planet's microbial diversity was vastly under-sampled and commercially untapped. The company was incorporated in 2019 around this insight. Gowers holds a doctorate in bioengineering and Vince a doctorate in synthetic biology; they serve as co-chief executives. ^[1]^[3]^[7]

The biodiversity database and ethical sourcing

Basecamp's core asset is its proprietary data. The company sends sampling expeditions to biodiverse and extreme environments, including acidic hot springs, Antarctic soils, and even World War II shipwrecks, and sequences the genetic material it recovers. Each data point is tagged with rich environmental and contextual metadata, such as the conditions and location where an organism was found, which the company says is critical for teaching models how proteins behave in their native context. The data is organized as a knowledge graph (BaseGraph) of sequences and the relationships between them. ^[3]^[8]

By June 2025 the company reported that its BaseData dataset contained 9.8 billion novel protein sequences and that, after removing redundant entries, it was over ten times larger than all public databases combined; it credited this work with the discovery of more than one million new species. The associated BaseGraph contained on the order of 5.5 to 6 billion biological relationships, which Basecamp describes as the largest such graph in existence. The company has consistently positioned this scale against the concentration of public data, noting that the bulk of public sequence information comes from a handful of model organisms. ^[3]^[5]^[9]

A defining feature of Basecamp's approach is that it claims to source 100 percent of its data ethically, under access and benefit-sharing arrangements consistent with the Nagoya Protocol to the Convention on Biological Diversity. Every sample is linked to the consent of the relevant landowner or authority and to an agreement under which the country or community of origin, which the company calls "guardians" or biodiversity partners, receives a share of revenue or royalties when the resulting digital sequence information is used commercially or to train AI models. Basecamp distributed its first wave of royalties in 2024 to 37 communities and organizations across 13 countries, a figure it later reported growing to roughly 60 organizations across 21 countries. In August 2024, ahead of the COP16 UN biodiversity conference in Cali, Colombia, Basecamp and the government of Cameroon announced a benefit-sharing deal that the company described as the first such digital-sequence-information agreement with a central African nation, pairing royalty payments with scientific training and laboratory funding. ^[3]^[8]^[10]^[11]

The table below summarizes self-reported scale figures, which have grown over time and should be read as company claims rather than independently audited statistics.

Metric	Reported figure	Approximate date
Novel protein sequences (BaseData)	9.8 billion	June 2025
Biological relationships (BaseGraph)	5.5 to 6 billion	2024 to 2025
New species identified	over 1 million	June 2025
Sampling locations	150-plus	2025 to 2026
Countries sampled	26 to 28	2025 to 2026
Community and organization partners	125-plus (later 152-plus)	2025 to 2026
Size vs. all public databases combined	over 10 times larger	June 2025

AI models (BaseFold and EDEN)

Basecamp has released two notable model lines built on its proprietary data.

In March 2024 the company introduced BaseFold, a protein structure prediction model created by augmenting AlphaFold2 with BaseGraph data. In results posted to bioRxiv and evaluated against CASP15 and CAMEO benchmark proteins, Basecamp reported up to a sixfold improvement in structural accuracy over AlphaFold2 for certain large, complex proteins and up to a threefold improvement in modeling small-molecule (ligand) interactions with protein targets. CTO Phil Lorenz described BaseGraph as "the core driver of our advances in AI." ^[4]^[12]

The EDEN models are a family of generative biology foundation models developed in collaboration with NVIDIA. The largest disclosed member, EDEN-28B, is described as a GPT-4-scale model trained on roughly 9.7 trillion biological (nucleotide) tokens spanning about 10 billion genes drawn from more than one million species. In January 2026, Basecamp announced that it had used the EDEN models to achieve programmable gene insertion, the placement of large therapeutic DNA sequences at precise locations in the genome, going beyond the small edits typical of CRISPR. The company reported designing insertion tools active at over 10,000 disease-relevant genomic sites, demonstrating CAR-T integration with greater than 90 percent tumor-cell clearance in laboratory assays, and a 97 percent (32 of 33) functional success rate for AI-designed antimicrobial peptides against pathogens on the World Health Organization's critical-priority list. The gene-insertion work built in part on technology licensed from Tome Biosciences. Vince characterized the milestone by saying, "You can debate whether it's perfect or not, but it will change how we develop medicine." ^[13]^[14]

Partnerships

In October 2024, alongside its Series B financing, Basecamp announced a multi-year genetic medicine collaboration with the laboratory of David R. Liu, a Howard Hughes Medical Institute investigator and core member of the Broad Institute of MIT and Harvard who is known for pioneering base editing and prime editing. The collaboration pairs the Liu Lab's gene editing and wet-lab expertise with Basecamp's proprietary datasets and in-house AI models to invent novel fusion proteins and other large molecules for "programmable" genetic medicines. A rationale cited for the partnership is that gene-editing tools can be mined from the natural "warfare" between bacteria and viruses, a domain that Basecamp's biodiversity sampling is well placed to capture. ^[1]^[2]^[6]

Basecamp's CEO has said the company holds partnerships with around 15 organizations in the biological sciences, including three large drugmakers. In March 2026 the company unveiled a "Trillion Gene Atlas" initiative, a moonshot to expand catalogued genetic diversity roughly 100-fold by gathering genomic data from more than 100 million species across thousands of sites, undertaken with Anthropic, sequencing firms Ultima Genomics and PacBio, and NVIDIA's AI infrastructure. The company has compared the effort in ambition to the Human Genome Project. It has also worked with Microsoft and NVIDIA on cloud and accelerated-computing infrastructure for its data pipeline. ^[6]^[8]^[15]

Funding

Basecamp Research has raised approximately $85 million in venture capital since its founding. The company closed a $20 million Series A round in December 2022, led by Systemiq Capital, with participation from investors including True Ventures and Hummingbird Ventures. In October 2024 it announced a $60 million Series B led by the Paris-based firm Singular, with participation from S32, redalpine, and several prominent individual investors, including Roche vice-chairman Andre Hoffmann, Royal Philips chair and former DSM chief executive Feike Sijbesma, and former Unilever chief executive Paul Polman, alongside returning investors True Ventures and Hummingbird Ventures. NVIDIA has separately made a strategic investment in the company to accelerate its AI work. ^[1]^[2]^[7]^[16]

Round	Amount	Date	Lead investor(s)
Series A	$20 million	December 2022	Systemiq Capital
Series B	$60 million	October 2024	Singular
Total raised	approximately $85 million	as of 2024 to 2026	includes NVIDIA strategic investment

Significance

Basecamp Research is frequently cited as a leading example of the thesis that proprietary, high-quality biological data is the binding constraint, and therefore the most durable competitive moat, for AI in the life sciences, much as web-scale text is for large language models. Where many computational biology efforts train on the same public repositories, Basecamp's wager is that systematically expanding the diversity and contextual richness of training data unlocks designs, enzymes, proteins, and gene-editing tools, that lie outside what existing models have ever seen. ^[2]^[6]^[8]

The company is equally notable for attempting to make large-scale biodiscovery ethical and traceable by design. Its insistence on linking every data point to documented consent and a benefit-sharing agreement is presented as a template for how commercial use of genetic resources can comply with the Nagoya Protocol and emerging digital-sequence-information rules, including those negotiated under the high-seas BBNJ agreement, while still returning value to the countries and communities where the biology originates. Whether Basecamp's self-reported scale claims and model performance translate into clinically and commercially validated products remains to be demonstrated, but its partnerships with the Broad Institute, NVIDIA, and Anthropic have made it one of the more closely watched companies positioning data, rather than model architecture alone, as the frontier of AI for biology. ^[3]^[8]^[10]

References

BioSpace, "Basecamp Research Initiates Genetic Medicine Collaboration with the Liu Laboratory and Completes $60 Million Series B Financing," October 9, 2024, https://www.biospace.com/press-releases/basecamp-research-initiates-genetic-medicine-collaboration-with-the-liu-laboratory-and-completes-60-million-series-b-financing ↩
BioPharma Dive, "AI startup Basecamp allies with the Broad to dream up 'programmable' genetic medicines," October 9, 2024, https://www.biopharmadive.com/news/basecamp-broad-david-liu-ai-genetic-medicine-series-b/729129/ ↩
GlobeNewswire / Basecamp Research, "Basecamp Research Announces Breakthrough Discovery of Over One Million New Species," June 11, 2025, https://www.globenewswire.com/news-release/2025/06/11/3097368/0/en/Basecamp-Research-Announces-Breakthrough-Discovery-of-Over-One-Million-New-Species-Yielding-Enormous-New-Database-Purpose-Built-for-Generative-Foundation-Models-in-Biology.html ↩
PR Newswire, "Basecamp Research Launches BaseFold: A Breakthrough in 3D Protein Structure Prediction of Large, Complex Protein Structures," March 12, 2024, https://www.prnewswire.com/news-releases/basecamp-research-launches-basefold-a-breakthrough-in-3d-protein-structure-prediction-of-large-complex-protein-structures-302085262.html ↩
HPCwire, "Graphing Biodiversity to Improve Drug Discovery," February 20, 2025, https://www.hpcwire.com/2025/02/20/graphing-biodiversity-to-improve-drug-discovery/ ↩
Fortune, "Could data from 100 million species help cure disease? One startup is betting on it," March 19, 2026, https://fortune.com/2026/03/19/basecamp-research-gathered-genomic-data-from-100-million-species-in-bet-they-can-build-ai-models-to-help-cure-disease/ ↩
NutraIngredients, "Basecamp Research raises $20M to map life on earth and design proteins to match," December 21, 2022, https://www.nutraingredients.com/Article/2022/12/21/Basecamp-Research-raises-20M-to-map-life-on-earth-and-design-proteins-to-match/ ↩
True Ventures, "Basecamp Research: Accelerating AI Driven Insights with NVIDIA to Build the World's Largest and Most Diverse Biological Sequence Dataset," https://trueventures.com/blog/basecamp-research-accelerating-ai-driven-insights-with-nvidia-to-build-the-worlds-largest-and-most-diverse-biological-sequence-dataset ↩
Royal Society of Biology, "An insatiable appetite for sequences" (interview with Basecamp Research), https://www.rsb.org.uk/biologist-interviews/an-insatiable-appetite-for-sequences-3 ↩
EIN Presswire, "Basecamp Research and Cameroon announce pioneering access and benefit-sharing deal, highlighting new model before COP 16," August 14, 2024, https://www.einpresswire.com/article/735533702/basecamp-research-and-cameroon-announce-pioneering-access-benefit-sharing-deal-highlighting-new-model-before-cop-16 ↩
Basecamp Research (Medium), "The BBNJ Agreement, benefit sharing from biotechnology and Basecamp Research," https://medium.com/@basecamp-research/the-bbnj-treaty-benefit-sharing-from-biotechnology-and-basecamp-research-f0f8290f99d4 ↩
SynBioBeta, "Basecamp Research Unveils BaseFold: Revolutionary Deep Learning Model for Protein Structure Prediction," https://www.synbiobeta.com/read/basecamp-research-unveils-basefold-revolutionary-deep-learning-model-for-protein-structure-prediction ↩
GEN (Genetic Engineering & Biotechnology News), "Basecamp Research Achieves Programmable Gene Insertion with EDEN AI Models," January 2026, https://www.genengnews.com/topics/artificial-intelligence/basecamp-research-achieves-programmable-gene-insertion-with-eden-ai-models/ ↩
PR Newswire, "Basecamp Research launches world-first AI models for programmable gene insertion," January 12, 2026, https://www.prnewswire.com/news-releases/basecamp-research-launches-world-first-ai-models-for-programmable-gene-insertion-302657979.html ↩
Microsoft for Startups Blog, "Basecamp Research leverages Microsoft and NVIDIA AI for biodiversity research," https://www.microsoft.com/en-us/startups/blog/catalyst-basecamp-research-leverages-microsoft-and-nvidia-ai-to-unlock-secrets-of-biodiversity/ ↩
Business Wire, "Basecamp Research Raises $20M Series A to Design Protein Products Reflecting the World's Biodiversity," December 13, 2022, https://www.businesswire.com/news/home/20221213005269/en/Basecamp-Research-Raises-$20M-Series-A-to-Design-Protein-Products-Reflecting-the-Worlds-Biodiversity ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Cradle