Elicit (research tool)
Last reviewed
Jun 4, 2026
Sources
22 citations
Review status
Source-backed
Revision
v1 ยท 2,561 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 4, 2026
Sources
22 citations
Review status
Source-backed
Revision
v1 ยท 2,561 words
Add missing citations, update stale details, or suggest a clearer explanation.
Elicit is an AI research assistant that helps researchers find, screen, summarize, and extract data from academic papers, with a particular focus on automating parts of the systematic review and literature review process. The product began as an internal project at Ought, a nonprofit machine learning research lab co-founded by Andreas Stuhlmuller and Jungwon Byun, and in 2023 it spun out into an independent public benefit corporation (PBC) of the same name. Elicit searches a corpus of well over 100 million scholarly papers, uses large language models to retrieve relevant studies without exact keyword matches, and organizes findings into structured tables. By early 2025 the company reported that more than 400,000 researchers used the tool each month, positioning it as one of the most widely adopted products in the emerging category of AI tools for science.[1][2][3]
Elicit grew out of Ought, a San Francisco nonprofit research lab that Andreas Stuhlmuller started in 2017 and that received early general-support funding from Open Philanthropy in 2018.[4][5] Stuhlmuller holds a PhD from MIT, where he studied probabilistic programming and cognitive science with Josh Tenenbaum, and he did postdoctoral research at Stanford with Noah Goodman before founding the lab.[6][4] Jungwon Byun, who became co-founder and chief operating officer, previously led growth at the lending company Upstart and worked as a management consultant at Oliver Wyman; she holds a B.A. in economics from Yale.[6]
Ought's stated mission was to "scale up good reasoning," and the lab studied factored cognition and process supervision as approaches to building machine learning systems whose reasoning humans could oversee. Factored cognition asks whether complex cognitive tasks can be decomposed into smaller pieces, each handled with limited context, so that the overall process remains transparent and steerable. Out of this research agenda the lab built Elicit as a practical tool that applied language models to the open-ended reasoning task of reviewing scientific literature. Ought also released supporting infrastructure such as the Interactive Composition Explorer (ICE), an open-source Python library for compositional language model programs, and a Factored Cognition Primer.[7][4]
Early versions of Elicit, developed from roughly 2021 onward, used GPT-3 and searched the Semantic Scholar database. The tool let users type a research question in natural language and returned the papers most likely to answer it, along with one-sentence summaries of each abstract tailored to the question. Successive updates added the ability to extract data points across many papers and to discover concepts spanning a body of literature, for example surfacing all of the reported effects of a substance such as creatine.[8][1]
In September 2023, Elicit separated from Ought to become an independent public benefit corporation, a for-profit corporate form that legally commits a company to a stated social mission alongside shareholder returns. Ought said it believed the spin-out was the best way to pursue its mission, and the nonprofit retained a stake in the new entity. The transition was supported by a $9 million seed round announced on September 25, 2023, led by the deep-tech seed fund Fifty Years, with participation from Basis Set, Illusian, Mythos, and Julian. Angel investors in the round included GitHub co-founder Tom Preston-Werner, Dropbox co-founder Arash Ferdowsi, former Novartis chief executive Thomas Ebeling, and Google chief scientist Jeff Dean. At the time of the announcement the company said more than 200,000 researchers used Elicit each month, including users at Harvard, MIT, Stanford, Genentech, Novartis, and OpenAI.[1][8][2]
On February 26, 2025, Elicit announced a $22 million Series A round at a reported valuation of about $100 million, co-led by Spark Capital and Footwork, with existing investors Fifty Years, Basis Set, and Mythos also participating. The company said the funding would let it expand beyond academic research toward becoming a platform for evidence-based decision-making in fields such as pharmaceuticals, medtech, and consumer goods, and it cited customers including the diagnostics firm MicroGenDX and the German technical association VDI/VDE. The company reported that more than 400,000 researchers were using Elicit monthly by the time of the raise.[3][9]
| Round | Date | Amount | Lead investor(s) | Notable participants |
|---|---|---|---|---|
| Seed | September 25, 2023 | $9 million | Fifty Years | Basis Set, Illusian, Mythos, Julian; angels Tom Preston-Werner, Arash Ferdowsi, Thomas Ebeling, Jeff Dean |
| Series A | February 26, 2025 | $22 million (about $100M valuation) | Spark Capital, Footwork | Fifty Years, Basis Set, Mythos |
Elicit is built on top of large language models layered over a large index of scholarly literature. Rather than relying on exact keyword matching, it uses semantic search so that a query returns conceptually relevant papers even when they do not share vocabulary with the question. For each result, the system can generate a short summary of the abstract written specifically to address the user's question, and it can pull requested data points (such as sample size, study design, intervention, or outcome) out of many papers at once and lay them out as columns in a table that Elicit calls a research matrix or report.[8][10]
The underlying paper corpus draws on aggregators of academic metadata. Elicit's documentation and independent reviews describe it as searching across sources including Semantic Scholar, OpenAlex, and PubMed. The advertised size of the corpus has grown over time, from "over 125 million" papers around the 2023 spin-out to "138 million or more" papers by 2026.[11][12][2] Because Elicit depends on these databases, its coverage is bounded by what they index, a limitation that researchers have flagged when comparing it to traditional multi-database literature searches.[13]
The company has described its more advanced workflows as "research agents" that decompose a prompt into a structured program of steps (search, screen, extract, and synthesize) and then execute that program, an approach that echoes the factored-cognition ideas from its Ought origins. Elicit emphasizes that outputs are meant to be grounded in evidence, attaching supporting quotes and source citations so that users can verify each claim against the original paper rather than trusting an unsourced generation.[14][15]
In March 2024, Elicit introduced Notebooks, a workspace that combines papers from multiple searches and user uploads onto a single page, where researchers can chat with the collected papers, generate multiple summaries, and apply filters and columns to narrow a set down to the most relevant studies. Notebooks unified what had previously been separate steps into one continuously editable surface.[16][12]
Elicit announced its Reports feature and a dedicated Systematic Review product in February 2025. Elicit Reports generate a structured literature review for a question by searching, screening, extracting, and synthesizing the relevant papers. Elicit Systematic Review, announced on February 20, 2025, guides a user through the formal stages of a systematic review: it can supplement a traditional keyword search with up to several hundred semantically retrieved papers, automatically generate screening criteria from the research question, sort papers by how likely they are to meet those criteria, and produce extraction suggestions accompanied by supporting quotes and explanations. Users can review, override, or manually enter criteria at each step, export results to CSV, and re-run steps to maintain "living" reviews. The company says the tool can cut the time required for a systematic review by up to 80 percent, and it markets screening capacities that scale by plan into the tens of thousands of papers. The systematic review workflow is offered on the paid Pro, Team/Scale, and Enterprise tiers.[10][17][12]
In December 2025, Elicit launched a beta of agentic research workflows that extend the tool beyond peer-reviewed papers to sources such as clinical trial registries, regulatory documents, press releases, and product labels. The initial workflows covered competitive landscapes, research landscapes, clinical trial analyses (drawing on ClinicalTrials.gov), and open-ended topic exploration. As with the rest of the platform, the agent maintains citations for the claims it produces. Research Agents are available to Pro, Team, and Enterprise users.[14]
Elicit released an API, initially in preview, that lets developers query its corpus and request full reports programmatically. The search endpoint returns structured JSON with titles, authors, abstracts, citation counts, DOIs, and PDF links, with filtering by study type, year, and journal metrics; a reports endpoint runs the asynchronous search-screen-extract-synthesize pipeline and returns a literature review. The API is aimed at running research at scale and embedding Elicit into other tools and workflows, and it is available on the paid tiers.[15]
In July 2024, Elicit added the ability to search and analyze clinical trials, drawing on a database of several hundred thousand studies, which broadened the tool's usefulness for medical and life-sciences research.[12]
Elicit's pricing has changed several times. In mid-2024 the company overhauled its plans to introduce an unlimited free tier for basic searches, summaries, and chats. The structure was later reorganized around heavier research and systematic-review usage. As of 2026 the published tiers are roughly as follows; limits such as the number of reports per month, table columns, and papers that can be screened scale up with each tier.[18][19]
| Plan | Price (monthly) | Intended for | Notable limits/features |
|---|---|---|---|
| Basic | Free | Casual users | Limited access to the research agent; a small number of automated reports per month; few table columns |
| Pro | $49 per user | Individual researchers and academics | Extended research-agent access; systematic-review screening up to several thousand papers; API access; many more reports and table columns |
| Scale (Team) | $169 per user | Research teams and labs | Full research-agent access; figure extraction; real-time collaboration; admin controls |
| Enterprise | Custom | Organizations | Largest screening capacity; SSO/SAML; dedicated support; custom integrations |
Annual billing carries a discount, and Elicit has offered an unlimited free tier that the company described as among the most generous of any research tool.[19][18]
Because Elicit is used for tasks where errors carry real consequences, its accuracy has been studied both internally and by outside researchers, with mixed but generally cautious conclusions.
Elicit has published its own evaluations. For Systematic Review it reported screening recall (sensitivity) of about 93.6 percent against 58 published systematic reviews, rising to roughly 96.4 percent when researchers verified the auto-generated screening criteria first, alongside specificity around 62.8 percent. For data extraction it reported internal accuracy around 94 percent against vetted gold-standard answers, and it cited external checks by VDI/VDE (about 99.4 percent) and by Australia's CSIRO comparing Elicit against a GPT-4-Turbo baseline and human reviewers.[17] In a later summary the company reported figures of 95 percent search recall, 97 percent abstract screening, 99 percent full-text screening, and 96 percent extraction.[12]
Independent studies have been more measured. A 2025 proof-of-concept published in a peer-reviewed journal compared Elicit's automated data extraction against human reviewers across 43 studies and 602 data points, finding overall accuracy of 81.4 percent for Elicit versus 86.7 percent for human reviewers, a difference the authors reported as not statistically significant; where Elicit and a human extracted the same field, the value was correct in essentially all cases, suggesting Elicit could serve as a useful second reviewer.[20] A separate study published in BMC Medical Research Methodology used Elicit to reproduce an umbrella review and found weaknesses: three repeated searches returned different numbers of results (246, 169, and 172), Elicit's final set overlapped only partially with the published review, and the traditional method surfaced 14 articles that Elicit missed. The authors attributed part of this to Elicit drawing on a single underlying database and concluded that tools like Elicit can be valuable complements but cannot, on their own, produce a rigorous systematic review without human oversight.[13] Reviewers broadly agree that Elicit works best for well-defined empirical questions and less well for theoretical or non-empirical topics.[21]
The name Elicit is sometimes confused with the unrelated AI-safety concept of "eliciting latent knowledge" (ELK). ELK is an open research problem in AI alignment, articulated by Paul Christiano and the Alignment Research Center (ARC), concerning how to get a trained model to honestly report facts that it internally "knows" but does not surface in its outputs. It has nothing to do with the Elicit research product, beyond the coincidental overlap in the everyday word "elicit." Elicit the company addresses literature review, not model interpretability or alignment.[22]
Elicit is frequently cited as a leading example of "AI for science," the use of generative AI and retrieval-augmented generation to accelerate research workflows. University libraries, including those at Arizona and Texas A&M, list it among recommended AI literature-search tools, and it is commonly compared with adjacent products such as the search-and-summary engine Consensus, the reading assistant SciSpace, and the general-purpose answer engine Perplexity. Coverage and academic guidance tend to praise its time savings for finding and summarizing papers while cautioning that its results should be verified and that it is not a substitute for comprehensive, multi-database searching in high-stakes evidence synthesis.[2][21][13]
The company positions itself within a broader thesis that AI agents will reshape knowledge work, and it has framed its long-term goal as becoming a standard platform for evidence-backed, "AI-native" decision-making rather than a literature tool alone. Its lineage out of an alignment-focused nonprofit gives it an unusual emphasis, for a commercial product, on process supervision and verifiable, citation-grounded outputs.[3][7]