FutureHouse
Last reviewed
Sources
18 citations
Review status
Source-backed
Revision
v1 · 2,235 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
18 citations
Review status
Source-backed
Revision
v1 · 2,235 words
Add missing citations, update stale details, or suggest a clearer explanation.
FutureHouse is a nonprofit research lab in San Francisco that is building an "AI Scientist," a set of automated systems intended to carry out scientific research with limited human supervision. The lab was founded in November 2023 by the physicist and bioengineer Sam Rodriques and the computational chemist Andrew White, and its first years were funded chiefly by the philanthropist and former Google chief executive Eric Schmidt. FutureHouse frames its work as a "moonshot" to automate and accelerate fundamental science, with a roughly ten year focus on biology. Its public output includes a family of AI agents for literature search, data analysis, and chemistry, several open benchmarks and models, and a hosted platform that opened to the public in 2025. In May 2026 the lab published a peer reviewed paper in Nature describing Robin, a multi agent system it used to propose an existing drug as a candidate treatment for a common cause of blindness.
FutureHouse sits within a wider movement to apply AI for science, and it positions itself as a research organization rather than a product company. A for profit spinout, Edison Scientific, was created in November 2025 to commercialize the technology for industry, while the nonprofit continues fundamental research and tool development.
FutureHouse was announced publicly on November 1, 2023, described at the time as an independent, philanthropically funded nonprofit research organization. Sam Rodriques is the co-founder, director, and chief executive. He completed a PhD in physics at MIT in 2019, working in Ed Boyden's group on tools for brain mapping and spatial transcriptomics, then ran an academic laboratory at the Francis Crick Institute in London before starting FutureHouse. Rodriques was named to TIME's 2025 list of the 100 most influential people in AI. Andrew White, a computational chemist who was a faculty member at the University of Rochester and who built one of the first large language model agents for chemistry (ChemCrow), is co-founder and Head of Science.
The lab states that its first two years were made possible by Eric and Wendy Schmidt. Eric Schmidt's broader science philanthropy is organized under Schmidt Sciences, a nonprofit he and Wendy Schmidt established to support research toward health, security, and scientific progress. FutureHouse has also reported support from Open Philanthropy, the US National Science Foundation, and the UK AI Safety Institute, among others. The organization is headquartered in San Francisco's Dogpatch neighborhood and describes itself as a 501(c)(3) nonprofit. Its early board included Tom Kalil, Adam Marblestone, and Tony Kulesa.
FutureHouse's stated mission is to "build an AI Scientist to automate and accelerate fundamental science that improves human health." The lab argues that scientific progress is bottlenecked less by raw ideas than by the human labor of reading the literature, designing experiments, and interpreting data, and that automating those steps could let individual scientists run many more experiments while broadening access to scientific expertise.
Two design commitments recur in the lab's writing. The first is a modular, agentic approach: rather than one monolithic model, FutureHouse builds specialized agents for distinct tasks and chains them together. The second is an emphasis on natural language as the working medium of science. Rodriques has said that "natural language is the real language of science," and the lab treats large language models and generative AI as reasoning engines that should be embedded in a larger system with tools, databases, and, where possible, wet lab validation. FutureHouse operates an in house wet lab so that AI generated hypotheses can be tested experimentally, and it has maintained a "Challenge Book" of concrete scientific tasks against which to measure progress.
FutureHouse's work spans benchmarks, open models, research agents, and a hosted platform. The lab's literature agents are built on retrieval augmented generation, pairing language models with tools that search papers, extract passages, and traverse the citation graph.
PaperQA, and its successor PaperQA2, are the foundation. PaperQA2 is an open source agent (Apache 2.0) for question answering over the scientific literature. In a September 2024 paper titled "Language agents achieve superhuman synthesis of scientific knowledge," the lab reported that PaperQA2 outperformed PhD and postdoctoral biologists on a literature retrieval benchmark called LitQA2, and that an associated system, WikiCrow, produced gene summaries judged by blinded experts to be more accurate on average than the corresponding Wikipedia articles. An earlier version had been used to draft articles for tens of thousands of human genes by drawing on roughly a million papers. These "superhuman" claims are the lab's own, measured on its benchmarks, and should be read as such.
LAB-Bench, released in mid 2024, is an evaluation set of 2,457 questions across eight categories (with 30 subtasks) designed to test capabilities needed to do biology research rather than recall of facts, including literature extraction, database retrieval, figure and table reasoning, protocol troubleshooting, and DNA sequence manipulation. About 80 percent of the data is public, with a held out test set to guard against training contamination.
Aviary, described in a December 2024 paper, is an open source framework for training language agents on scientific tasks; the lab reported that it let comparatively small open models reach human level on some literature research tasks at modest compute cost.
ether0, released in June 2025, is a 24 billion parameter open weights chemistry reasoning model relevant to AI drug discovery and molecular design. Built on Mistral's 24B Instruct model and refined with reinforcement learning over hundreds of thousands of chemistry tasks, it takes natural language questions, reasons in natural language, and outputs molecular structures. FutureHouse reported that it outperformed several frontier general models and chemistry specific tools on certain molecular design tasks, and released it under an Apache 2.0 license.
The FutureHouse Platform launched on May 1, 2025 at platform.futurehouse.org, offering web and API access to a set of named agents, several of them free to try. Robin, published later, is a multi agent system that orchestrates several of these agents to attempt end to end discovery.
| Agent or tool | Type | Released | Role |
|---|---|---|---|
| PaperQA / PaperQA2 | Open source RAG agent | 2023 / Sept 2024 | High accuracy question answering over scientific papers with citations |
| WikiCrow | Agent (built on PaperQA) | Dec 2023 | Generates cited, encyclopedia style summaries (used for human genes) |
| LAB-Bench | Benchmark dataset | 2024 | 2,457 questions testing biology research skills, not just knowledge |
| Aviary | Training framework | Dec 2024 | Trains language agents on scientific tasks; open source |
| Crow | Platform agent | May 2025 | General literature search with concise, cited answers; API friendly |
| Falcon | Platform agent | May 2025 | Deep literature review across many papers and databases such as OpenTargets |
| Owl | Platform agent | May 2025 | Answers "has anyone done this before," finding prior work and gaps |
| Phoenix | Platform agent | May 2025 | Experimental chemistry planning and reasoning; less thoroughly benchmarked |
| Finch | Platform agent | May 2025 | Analyzes biological datasets, writes and runs Python and R code |
| ether0 | Open weights model | June 2025 | 24B chemistry reasoning model that outputs molecules |
| Robin | Multi agent system | May 2025 preprint; Nature 2026 | Coordinates Crow, Falcon, and Finch for end to end discovery |
| Kosmos | AI Scientist (Edison Scientific) | Nov 2025 | Commercial co scientist that iterates over data, literature, and hypotheses |
FutureHouse's most publicized scientific result is Robin. In a paper first posted as a preprint on May 20, 2025 and published in Nature in May 2026 ("A multi-agent system for automating scientific discovery"), the lab described using Robin to study dry age-related macular degeneration (dry AMD), a leading cause of blindness with limited treatment options. Robin's literature agents (Crow and Falcon) reviewed the field and proposed enhancing phagocytosis by retinal pigment epithelium (RPE) cells as a therapeutic strategy; the data analysis agent Finch then analyzed an RNA sequencing experiment the system had proposed. The work converged on ripasudil, a Rho kinase (ROCK) inhibitor already approved in Japan to treat glaucoma but not previously linked to dry AMD, and surfaced ABCA1, a lipid efflux pump, as a candidate mechanism. The lab reported that the hypothesis generation, experimental design, and analysis loop was driven by the AI agents, with humans executing the wet lab steps, over a span of roughly two and a half months.
This is a notable demonstration, and it has been peer reviewed by Nature. It is important to be precise about what it shows. Ripasudil is a repurposing candidate, not a proven dry AMD therapy; the published evidence is preclinical, based on cell culture experiments rather than human clinical trials. Independent clinical validation of efficacy in patients has not been established by this work, and the lab's framing of Robin as performing "end to end" discovery refers to its role in proposing and analyzing experiments, not to fully autonomous laboratory work.
Beyond Robin, FutureHouse's recurring headline claim is that its literature agents reach or exceed expert human performance on specific benchmarks (LitQA2, RAG-QA Arena's science split, and head to head precision tests against frontier search models). These results come from the lab's own evaluations and are meaningful within those benchmarks, but they are not the same as a broad, independently audited finding that the agents are "superhuman" scientists.
FutureHouse operates as a nonprofit funded by philanthropy rather than by revenue or venture capital, which the lab presents as freeing it to pursue long horizon, high risk research. As industry interest grew after the May 2025 platform launch, the founders created a separate for profit company to serve commercial customers. On November 7, 2025, FutureHouse announced Edison Scientific, a spinout focused on building and scaling its AI agents for pharmaceutical and biotech use. Reports placed Edison's seed round at about 70 million dollars at a valuation near 250 million dollars.
Alongside the spinout, Edison launched Kosmos, described as a more powerful "AI Scientist." Given a goal and one or more datasets, Kosmos iterates among data analysis, literature search, and hypothesis generation and returns a fully cited research report. The company reported that a single 20 cycle run could read on the order of 1,500 papers and write tens of thousands of lines of analysis code, and that collaborators estimated one run did the equivalent of about six months of their own research on average. An independent review reportedly found roughly 79 percent of statements in Kosmos reports to be accurate. As with Robin, these figures originate largely with the developers and their collaborators and warrant independent replication. The arrangement keeps the nonprofit lab and the commercial entity distinct, with the nonprofit continuing fundamental research and benchmarks and the company handling enterprise deployment.
FutureHouse is one of the more visible attempts to turn AI for science from a slogan into working systems. Its contributions are concrete and largely open: published benchmarks (LAB-Bench), open source agents and frameworks (PaperQA2, Aviary), open weights models (ether0), and a public platform that scientists can use directly. The Robin paper is an early peer reviewed example of a multi agent system contributing to a specific biomedical hypothesis that was then tested in the lab, and the lab's literature tools have drawn genuine interest from working researchers.
It is distinct from for profit peers, such as drug discovery companies and the "AI co-scientist" efforts at large labs, in being structured as a nonprofit backed by science philanthropy, even as it has now incubated a commercial arm in Edison Scientific. The central open questions are those the field as a whole faces: how much of the lab's performance translates beyond its own benchmarks, how robust its automated discoveries are under independent scrutiny, and whether AI proposed candidates like ripasudil hold up in rigorous clinical testing. The work is promising and unusually transparent, but its strongest claims remain to be validated by parties outside the organization.