AI in drug discovery

AI for Science Artificial Intelligence Drug Discovery Healthcare AI

26 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

29 citations

Revision

v5 · 5,261 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

AI in drug discovery is the application of artificial intelligence, machine learning, and deep learning across the pharmaceutical pipeline to identify drug targets, design new molecules, predict protein structures, screen compound libraries, and optimize clinical trials. As of early 2026, more than 173 AI-discovered drug programs are in clinical development (roughly 94 in Phase I, 56 in Phase II, and 15 in Phase III), and the first analysis of these programs found that AI-discovered molecules cleared Phase I trials at an 80% to 90% rate versus a historical industry average near 40% to 55%. ^[1]^[22] No AI-discovered drug has yet received regulatory approval, and the same analysis found that AI-discovered candidates succeed in Phase II at only about 40%, in line with traditional drugs, leaving open the central debate over whether AI improves success at every clinical stage or mainly at the earliest one. ^[1]

The field combines several techniques pioneered by AI for science: AlphaFold for protein structure prediction, generative models for de novo molecule design, machine-learning virtual screening, and AI-assisted clinical-trial design. Leading companies include Isomorphic Labs (a Google DeepMind spinout), Insilico Medicine, Recursion Pharmaceuticals, Schrodinger, and Relay Therapeutics.

What is the drug discovery pipeline and where does AI help?

Traditional drug discovery is a long, expensive, and failure-prone process. Developing a single new drug from initial concept to regulatory approval typically takes 10 to 15 years and costs an estimated $2.6 billion on average, including the cost of failed programs. ^[20] Approximately 90% of drug candidates that enter clinical trials fail before reaching approval, with the most common reasons being lack of clinical efficacy (40% to 50%), unmanageable toxicity (30%), poor drug-like properties (10% to 15%), and lack of commercial viability (10%). ^[19]

The drug development pipeline consists of several major stages:

Stage	Description	Typical timeline
Target identification	Finding a biological molecule (often a protein) involved in a disease that can be modulated by a drug	1 to 2 years
Target validation	Confirming that modulating the target produces a therapeutic benefit	1 to 2 years
Hit identification	Screening large compound libraries to find molecules that interact with the target	6 months to 1 year
Lead optimization	Refining hit compounds to improve potency, selectivity, and drug-like properties	1 to 3 years
Preclinical development	Laboratory and animal studies to assess safety and pharmacology	1 to 2 years
Phase I clinical trials	Testing safety and dosing in healthy volunteers (20 to 100 participants)	1 to 2 years
Phase II clinical trials	Evaluating efficacy and side effects in patients (100 to 300 participants)	1 to 3 years
Phase III clinical trials	Confirming efficacy in large patient populations (1,000 to 5,000 participants)	2 to 4 years
Regulatory review	FDA or EMA assessment of trial data for market approval	0.5 to 2 years

AI and machine learning can contribute to nearly every stage of this pipeline. In target identification, AI analyzes genomic and proteomic data to pinpoint disease-relevant proteins. In molecule generation and lead optimization, generative models design novel compounds with desired properties. In preclinical development, AI predicts absorption, distribution, metabolism, excretion, and toxicity (ADMET) profiles. In clinical trials, AI helps with patient recruitment, adaptive trial design, and endpoint prediction. The cumulative effect is a potential reduction in R&D costs of 25% to 40% and shortened development timelines by several years. ^[1]

Target identification and validation

Target identification is the first and arguably the most consequential step in drug discovery. Choosing the wrong target leads to years of wasted effort and resources. AI helps by mining vast biological datasets to uncover novel therapeutic targets that would be difficult or impossible to identify through manual analysis alone.

Genomics and multi-omics integration

Modern target discovery relies on the integration of multiple types of biological data, including genomics, transcriptomics, proteomics, and metabolomics. Machine learning models can analyze these multi-omics datasets to identify genes and proteins that play causal roles in disease processes. For example, large language models trained on biomedical literature can systematically analyze disease-associated biological pathways and surface potential targets by connecting genes, diseases, and compounds in knowledge graphs.

Insilico Medicine developed a target identification platform called PandaOmics, which integrates 22 multi-modal data sources to build disease-specific models. Using PandaOmics, Insilico identified TNIK (Traf2- and Nck-interacting kinase) as a novel regulator of lung fibrosis pathways, a target that had not been previously explored for idiopathic pulmonary fibrosis (IPF). ^[3] This AI-identified target became the basis for rentosertib, the first AI-designed drug to reach Phase II clinical trials. ^[4]

How did AlphaFold change structure-based drug design?

AlphaFold, developed by Google DeepMind, represents one of the most significant breakthroughs in computational biology. AlphaFold2, released in 2020 by Demis Hassabis and John Jumper, solved the long-standing "protein folding problem" by accurately predicting three-dimensional protein structures from amino acid sequences alone. The AlphaFold Protein Structure Database, launched in July 2021, has grown to contain predicted structures for over 200 million proteins from more than one million species. Previously, obtaining a single protein structure through experimental methods such as X-ray crystallography could take months or years; AlphaFold can produce a prediction in minutes. ^[5]

In October 2024, Hassabis and Jumper were awarded the Nobel Prize in Chemistry for their work on protein structure prediction. David Baker shared the prize for his contributions to computational protein design. ^[6]

AlphaFold 3, released in May 2024 and co-developed by Google DeepMind and Isomorphic Labs, extended the capabilities of its predecessor by predicting the joint structure of complexes that include proteins, DNA, RNA, small molecules, ions, and modified residues. The model achieves at least a 50% improvement over existing methods for predicting protein interactions with other molecule types, and for some categories of interaction, accuracy has doubled compared to previous state-of-the-art approaches. ^[5]^[23] The authors describe the result in the Nature paper: "we show that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework." ^[5] For drug discovery specifically, AlphaFold 3 can accurately predict binding sites and interaction energies, which accelerates both target identification and the design of molecules that bind effectively to their targets.

Isomorphic Labs, a spin-off from Google DeepMind focused on applying AlphaFold to drug discovery, announced in February 2026 a Drug Design Engine (IsoDDE) that doubled the performance of AlphaFold 3 on a protein-ligand structure prediction generalization benchmark. On the hardest cases, those with less than 20% similarity to training data, IsoDDE achieved 50% accuracy where AlphaFold 3 reached 23.3%. ^[7]^[29]

Molecule generation and de novo drug design

Once a target has been identified and validated, the next challenge is to find or create molecules that modulate it. Traditional high-throughput screening involves testing millions of compounds from physical libraries, a process that is expensive and limited to existing chemical space. AI-driven de novo drug design instead generates entirely new molecules from scratch, exploring a vastly larger chemical space.

Generative chemistry

Several generative model architectures have been adapted for molecular design. These include variational autoencoders (VAEs), generative adversarial networks, normalizing flows, and diffusion models. These models learn the statistical distribution of known drug-like molecules and can then sample from that distribution to generate novel compounds with specified properties. ^[24]

A typical generative chemistry workflow involves encoding molecular structures as SMILES strings or molecular graphs, training a generative model on datasets of known bioactive compounds, and then sampling new molecules while optimizing for properties such as binding affinity, selectivity, solubility, and synthetic accessibility. Active learning cycles can iteratively refine the generated molecules using feedback from docking simulations, QSAR models, or even experimental assays.

Insilico Medicine's Chemistry42 platform uses generative models to design novel small molecules for a given target. Using this platform, Insilico designed rentosertib (ISM001-055) in a fraction of the time that traditional medicinal chemistry would require. The compound went from target identification to preclinical candidate nomination in approximately 18 months, compared to the industry average of 4 to 5 years. ^[3]

Exscientia, a UK-based company that merged with Recursion Pharmaceuticals in November 2024, developed the Centaur AI platform, which combines generative design with automated experimentation. Exscientia reported that its platform reduced the drug design timeline from the industry average of 4.5 years to just 12 to 15 months while decreasing capital costs by up to 80%. ^[13] Six molecules designed using Exscientia's AI entered clinical trials, including EXS21546, an A2A receptor antagonist for oncology discovered in just eight months.

Multi-objective optimization

A key challenge in drug design is that molecules must satisfy many conflicting requirements simultaneously. They need high potency against the target, selectivity over related proteins, favorable pharmacokinetic properties, low toxicity, and practical synthetic routes. AI methods excel at multi-objective optimization, using techniques such as Pareto optimization and reinforcement learning to navigate these trade-offs and identify molecules that represent the best compromises across all criteria.

Virtual screening

Virtual screening uses computational methods to evaluate large libraries of compounds and identify those most likely to bind to a target protein. There are two main approaches: structure-based virtual screening, which uses the three-dimensional structure of the target, and ligand-based virtual screening, which uses information about known active molecules.

Molecular docking

Molecular docking is the traditional workhorse of structure-based virtual screening. Docking programs predict how a small molecule fits into the binding site of a protein and estimate the strength of the interaction through a scoring function. However, traditional scoring functions often fail to accurately model the full complexity of protein-ligand interactions, leading to high false-positive rates.

AI-enhanced docking methods address these limitations. Machine learning scoring functions trained on large datasets of experimental binding data can better predict actual binding affinities. For instance, the Deep Docking (DD) platform enables up to 100-fold acceleration of structure-based virtual screening by docking only a subset of a chemical library and iteratively predicting the scores for remaining compounds using a ligand-based model. RosettaVS, published in Nature Communications, combines molecular dynamics with AI to model receptor flexibility and outperforms traditional methods on multiple benchmarks. ^[27]

Ultra-large library screening

Recent advances have made it possible to screen virtual libraries containing billions of compounds. Atomwise's AtomNet platform uses deep learning and convolutional neural networks to analyze protein structures and predict drug-target interactions across a proprietary library of more than three trillion synthesizable compounds. By predicting docking scores with machine learning rather than performing full docking simulations, these platforms can evaluate chemical spaces that would be computationally prohibitive with traditional methods.

ADMET prediction

ADMET stands for Absorption, Distribution, Metabolism, Excretion, and Toxicity. These pharmacokinetic and safety properties determine whether a drug candidate can reach its target in the body at an effective concentration without causing unacceptable side effects. Poor ADMET properties are responsible for roughly 10% to 15% of clinical trial failures and have historically been a major cause of late-stage attrition. ^[19]

Machine learning models for ADMET prediction have become increasingly sophisticated. Common approaches include:

ADMET property	What it measures	Common ML approaches
Aqueous solubility	Whether the drug dissolves in body fluids	Random forest, graph neural networks
Intestinal permeability	Ability to cross the gut wall and enter the bloodstream	Support vector machines, gradient boosting
CYP inhibition	Interaction with cytochrome P450 liver enzymes	XGBoost, multitask neural networks
Plasma protein binding	Fraction of drug bound to blood proteins (unavailable for action)	Ensemble methods, deep learning
hERG channel inhibition	Cardiac toxicity risk	Random forest, attention-based models
Hepatotoxicity	Liver damage potential	Graph neural networks, transformer models

Modern ADMET prediction frameworks increasingly use graph neural networks that operate directly on molecular graphs, capturing structural features that are relevant to pharmacokinetic behavior. Multitask learning, where a single model predicts multiple ADMET endpoints simultaneously, has shown improvements over single-task models because ADMET properties are often correlated. AutoML methods that automatically search for optimal model architectures and hyperparameters have further improved the reproducibility and performance of ADMET predictions. ^[28]

Despite these advances, challenges remain. Training data for ADMET models is often limited, noisy, and biased toward certain chemical series. Predictions may not generalize well to novel chemical scaffolds, and the gap between in vitro measurements and in vivo outcomes remains significant. ^[28]

Clinical trial optimization

AI is increasingly being applied beyond drug design to the clinical trial process itself. Clinical trials account for approximately 60% of total drug development costs and are the stage most prone to delays and failures.

Patient recruitment

Nearly 80% of clinical trials experience delays, with patient recruitment cited as one of the most common challenges. AI platforms can identify protocol-eligible patients up to three times faster by processing electronic health records, genetic profiles, and demographic information with approximately 93% accuracy. Natural language processing models parse unstructured medical records to match patients to eligibility criteria, while predictive models identify patients most likely to benefit from treatment and to remain compliant throughout the trial.

Adaptive trial design

Adaptive clinical trials use accumulating data to modify aspects of the study without undermining its validity. Machine learning algorithms analyze interim results using reinforcement learning, decision trees, and neural networks to inform real-time protocol updates. These updates can include dosage adjustments, treatment arm modifications, and patient reallocation. AI-driven adaptive designs can reduce sample sizes, shorten trial durations, and increase the probability of detecting true treatment effects.

Digital twins and synthetic control arms

Digital twins, virtual representations of individual patients built from their clinical and molecular data, allow researchers to simulate patient-specific responses to treatment. Synthetic control arms use historical patient data modeled by AI to serve as comparators, potentially reducing the number of patients needed in control groups. Both approaches are being explored as ways to make trials more efficient and ethical, particularly for rare diseases where patient populations are small.

Which companies lead AI drug discovery?

Isomorphic Labs

Isomorphic Labs was founded in 2021 as a spin-off from Google DeepMind, led by DeepMind founder Demis Hassabis. The company's mission is to apply AlphaFold and related AI technologies to drug discovery at scale. In January 2024, Isomorphic announced major partnerships with Eli Lilly and Novartis worth up to nearly $3 billion in combined milestone payments. ^[8] The Novartis partnership was expanded in February 2025 to include up to three additional research programs.

In March 2025, Isomorphic raised $600 million in its first external financing round, led by Thrive Capital with participation from GV and follow-on capital from existing investor Alphabet. ^[9] The company stated the funding would "accelerate Isomorphic's frontier AI research and rapidly advance the company's next-generation AI drug design engine," with first programs heading toward the clinic. ^[9] In February 2026, Isomorphic unveiled its Drug Design Engine (IsoDDE), which more than doubled the accuracy of AlphaFold 3 on the most difficult protein-ligand structure prediction cases. ^[7]^[29]

Insilico Medicine

Insilico Medicine was founded in 2014 by Alex Zhavoronkov. The company's end-to-end Pharma.AI platform includes PandaOmics for target identification, Chemistry42 for generative drug design, and InClinico for clinical trial outcome prediction. Since 2021, Insilico has nominated over 20 preclinical candidates across a portfolio of more than 30 assets and received IND clearance for 10 molecules. ^[3]

Insilico's lead program, rentosertib (ISM001-055), is a TNIK inhibitor for idiopathic pulmonary fibrosis that was the first drug fully discovered and designed by generative AI to enter Phase II clinical trials. Phase IIa results published in Nature Medicine on June 3, 2025 (a randomized, double-blind, placebo-controlled trial of 71 patients across 21 sites in China) showed that patients receiving 60 mg once daily experienced a mean improvement in lung function (FVC) of +98.4 mL over 12 weeks, compared to a decline of -20.3 mL in the placebo group. ^[4] The most common drug-related adverse events were diarrhea (14.8%) and abnormal liver function (14.8%). A separate Phase IIa trial in the United States is also ongoing.

Beyond rentosertib, Insilico's pipeline includes ISM3091, a USP1 inhibitor for solid tumors licensed to Exelixis, and ISM5939, an oral ENPP1 inhibitor that received FDA IND clearance in November 2024 for a first-in-human Phase 1a/b study in solid tumors. ^[26]

Recursion Pharmaceuticals

Recursion Pharmaceuticals was founded in 2013 in Salt Lake City, Utah. The company operates highly automated laboratories that run up to 2.2 million experiments per week, generating biological imaging data that feeds into the Recursion Operating System (Recursion OS). Over the past decade, Recursion has accumulated over 50 petabytes of proprietary biological and chemical data spanning phenomics, transcriptomics, proteomics, and ADME measurements. ^[10]

In partnership with NVIDIA, Recursion built BioHive-2, an NVIDIA DGX SuperPOD comprising 63 DGX H100 systems with 504 NVIDIA H100 Tensor Core GPUs that delivers 2 exaflops of AI performance. Completed in May 2024, it ranked #35 on the TOP500 list and is the fastest supercomputer wholly owned by a pharmaceutical company. ^[25] In November 2024, Recursion completed its acquisition of Exscientia in a deal valued at roughly $688 million, creating a vertically integrated AI drug discovery platform combining Recursion's phenomic screening with Exscientia's automated precision chemistry. ^[25]

Key clinical programs include REC-4881 for familial adenomatous polyposis (FAP), which showed a median 43% reduction in polyp burden in Phase 1b/2 data, and REC-617, a CDK7 inhibitor for advanced solid tumors currently in Phase 1. ^[11]^[12]

Exscientia (now part of Recursion)

Exscientia, founded in the UK, pioneered the use of AI to design small-molecule drug candidates. Before its acquisition by Recursion, six molecules designed using Exscientia's AI entered clinical trials. ^[13] Notable programs included EXS21546, the first AI-designed oncology immunotherapy molecule to enter human trials; DSP-0038, a dual 5-HT1a/5-HT2a modulator for Alzheimer's disease psychosis developed with Sumitomo Dainippon Pharma; and GTAEXS617, a CDK7 inhibitor in Phase 1/2 trials for advanced solid tumors.

Relay Therapeutics

Relay Therapeutics is a clinical-stage precision medicine company founded in Cambridge, Massachusetts. Its Dynamo platform integrates molecular dynamics simulations, cryo-electron microscopy, and machine learning to design drugs that target specific protein conformations. ^[14] This approach enables the creation of mutant-selective inhibitors that target only the pathological form of a protein while minimizing off-target effects.

Relay's lead asset, RLY-2608, is a pan-mutant PI3K-alpha inhibitor in Phase 3 trials for HR+/HER2- metastatic breast cancer. Updated data from Q2 2025 showed a 10.3-month median progression-free survival and a 39% overall response rate in patients with PI3K-alpha-mutated tumors. ^[15] The company also licensed RLY-4008 to Elevar Therapeutics in late 2024 in a deal worth up to $500 million and has a collaboration with Pfizer to explore combination therapies.

Schrodinger

Schrodinger, Inc. was founded in 1990 in New York City and is a pioneer in physics-based computational chemistry for drug discovery. ^[18] Rather than relying solely on data-driven machine learning, Schrodinger's platform uses first-principles physics simulations to predict how molecules interact. This approach, combined with machine learning, enables the company to evaluate billions of virtual compounds with high accuracy.

Schrodinger's most advanced clinical asset is zasocitinib (TAK-279), a highly selective allosteric TYK2 inhibitor developed in partnership with Takeda. Zasocitinib demonstrates more than one-million-fold selectivity for TYK2 over JAK1, JAK2, and JAK3. ^[16] In December 2025, Takeda reported positive Phase 3 topline results from its LATITUDE PsO program in plaque psoriasis, with more than half of patients achieving clear or almost clear skin (PASI 90) and about 30% achieving completely clear skin (PASI 100) at week 16; the studies met both co-primary endpoints and all 44 ranked secondary endpoints. ^[16]^[17] Zasocitinib is also being evaluated in Phase 3 studies for psoriatic arthritis and Phase 2 studies for inflammatory bowel disease.

What is AlphaFold's overall impact on drug discovery?

AlphaFold's impact on drug discovery extends beyond protein structure prediction. Before AlphaFold, experimental determination of protein structures was a bottleneck for structure-based drug design. Only a fraction of human proteins had experimentally resolved structures, and many disease-relevant targets lacked structural information entirely. AlphaFold's database of 200 million predicted structures has made structural information available for virtually every known protein, opening up previously intractable targets for drug development. ^[5]

However, there are important limitations. AlphaFold2 predicts static structures and cannot fully capture the dynamic conformational changes that proteins undergo in biological environments. The predicted structure may not represent the form that binds a drug molecule. AlphaFold also does not directly predict binding affinities or druggability, meaning that additional computational and experimental work is still needed to translate structural predictions into drug candidates. ^[23]

AlphaFold 3 addresses some of these limitations by modeling interactions between proteins and other biomolecules, including small-molecule ligands. Isomorphic Labs has demonstrated that AlphaFold 3 outperforms traditional docking tools for predicting protein-ligand structures, and pharmaceutical companies including Novartis and Eli Lilly are collaborating with Isomorphic to apply these capabilities to real-world drug design programs. ^[7]^[8]

Do AI-designed drugs actually have higher success rates?

The number of AI-designed drug candidates entering clinical trials has grown rapidly, from 3 programs in 2016 to 67 in 2023 and more than 173 as of early 2026. ^[22] The first systematic analysis of this pipeline, by Jayatunga and colleagues in Drug Discovery Today (2024), examined the 24 AI-discovered molecules that had completed Phase I as of December 2023 and found that 21 succeeded, an 80% to 90% success rate that is "substantially higher than historic industry averages" of roughly 40% to 55%. ^[1] The authors concluded this "suggests that AI is highly capable of designing or identifying molecules with drug-like properties." ^[1]

The same analysis introduced an important caveat that drives the ongoing debate: in Phase II, AI-discovered drugs succeeded at only about 40%, in line with the traditional industry average, although on a small sample. ^[1] In other words, the strongest evidence to date shows AI improving the odds of clearing the earliest safety stage rather than guaranteeing efficacy in patients. As of mid-2026, no AI-discovered drug has received FDA approval, but the first approval is widely projected for 2026 or 2027.

The table below highlights notable AI-designed drugs that have reached clinical trials:

Drug	Company	Target	Indication	Phase	AI role	Key data
Rentosertib (ISM001-055)	Insilico Medicine	TNIK	Idiopathic pulmonary fibrosis	Phase IIa	Target ID + molecule design	+98.4 mL FVC improvement vs. placebo
Zasocitinib (TAK-279)	Schrodinger / Takeda	TYK2 (allosteric)	Plaque psoriasis, psoriatic arthritis	Phase III	Physics-based compound design	>50% PASI 90 at week 16
RLY-2608	Relay Therapeutics	PI3K-alpha (pan-mutant)	HR+/HER2- breast cancer	Phase III	Molecular dynamics + ML design	10.3-month median PFS, 39% ORR
ISM3091	Insilico Medicine	USP1	Solid tumors (BRCA-mutated)	Phase I	Generative design	Licensed to Exelixis
ISM5939	Insilico Medicine	ENPP1	Solid tumors	Phase I	Generative design	IND cleared November 2024
REC-4881	Recursion	MEK1/2	Familial adenomatous polyposis	Phase 1b/2	Phenomic screening + AI	43% median polyp reduction
REC-617	Recursion	CDK7	Advanced solid tumors	Phase I	AI-driven discovery	1 confirmed partial response
GTAEXS617	Exscientia (Recursion)	CDK7	Advanced solid tumors	Phase 1/2	Generative AI design	Improved potency and selectivity
DSP-0038	Exscientia / Sumitomo	5-HT1a / 5-HT2a	Alzheimer's disease psychosis	Phase I	Dual-target AI design	Designed in collaboration
EXS21546	Exscientia	A2A receptor	Solid tumors (immuno-oncology)	Phase 1/2	AI design in 8 months	Discontinued after peer data

Challenges and limitations

Data quality and availability

AI models are only as good as the data they are trained on. In drug discovery, high-quality labeled data is scarce. Many biological assays are noisy, and results can vary across laboratories and experimental conditions. A survey of technology executives found that 68% cite poor data quality and governance as the primary reason AI initiatives fail. Data preparation alone consumes an estimated 80% of an AI project's time. Additionally, much of the most valuable data (clinical outcomes, proprietary compound activity) is held by pharmaceutical companies and is not publicly available, limiting the ability to train broadly generalizable models.

Biological complexity

Biology is far more complex than current AI models can fully capture. Diseases involve intricate networks of interacting proteins, genes, and environmental factors. A molecule that looks promising in silico may fail in living systems due to off-target effects, unexpected metabolic pathways, or immune responses that no model predicted. The gap between computational predictions and clinical reality remains the fundamental challenge in AI-driven drug discovery, and it is the most plausible explanation for why AI's Phase II success rates have so far matched, rather than exceeded, traditional approaches. ^[1]

Regulatory landscape

On January 7, 2025, the FDA published its first comprehensive draft guidance on the use of AI in drug development, titled "Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products." The guidance introduces a seven-step credibility assessment framework based on "context of use," requiring lifecycle maintenance plans and transparency about model architectures and training data. ^[21] Different levels of documentation are required depending on risk level, which the framework derives from model influence and decision consequence. The European Medicines Agency (EMA) is expected to publish parallel guidance by mid-2026.

The regulatory framework remains a work in progress. While the FDA has stated that AI-discovered drugs do not require a different approval pathway than traditionally discovered drugs, the use of AI in regulatory submissions raises questions about model interpretability, reproducibility, and accountability. More than 25% of warning letters issued by the FDA since 2019 have cited data accuracy issues, underscoring the importance of data governance in AI applications.

Interpretability and reproducibility

Many AI models, particularly deep neural networks, function as "black boxes" that provide predictions without transparent explanations of their reasoning. In a field where understanding the mechanism of action is critical for both scientific progress and regulatory approval, the lack of interpretability is a significant barrier. Efforts to develop explainable AI methods for drug discovery are ongoing but have not yet produced solutions that satisfy both computational performance and human interpretability requirements.

How big is the AI drug discovery market?

The AI in drug discovery market is growing rapidly, though estimates vary across research firms depending on scope and methodology:

Source	Market size (2025)	Projected size	CAGR	Projection year
Grand View Research	$2.35 billion	$13.77 billion	24.8%	2033
Precedence Research	$4.6 billion	$49.5 billion	30.1%	2034
BioSpace	Not specified	$16.52 billion	Not specified	2034
SNS Insider	Not specified	$15.50 billion	29.89%	2032

North America dominated the market in 2025 with approximately 53% revenue share. ^[1] The drug optimization and repurposing segment led by application with roughly 52% share. Key drivers include the rising demand for cost-effective drug development, the increasing number of clinical trials, and growing chronic disease prevalence. AI is estimated to reduce pharmaceutical R&D costs by 25% to 40% and shorten clinical trial timelines significantly. ^[2]

Major technology companies have made substantial investments in the space. Alphabet invested in Isomorphic Labs, which raised $600 million in March 2025. ^[9] NVIDIA has partnered with Recursion Pharmaceuticals on the BioHive-2 supercomputer. ^[25] AMD invested $20 million in Absci in January 2025 to support AI-driven biologics discovery. These investments reflect the growing conviction that AI will become a foundational technology in pharmaceutical R&D.

Future outlook

The field of AI in drug discovery is at an inflection point. Multiple AI-designed drugs are in late-stage clinical trials, and the first FDA approval of an AI-discovered drug is widely anticipated in 2026 or 2027. The continued improvement of protein structure prediction models, generative chemistry platforms, and clinical trial optimization tools suggests that AI's role in drug development will only deepen.

Key trends to watch include the development of foundation models for biology that can integrate diverse data types into unified representations, the expansion of federated learning approaches that allow pharmaceutical companies to train models on pooled data without sharing proprietary information, and the emergence of autonomous AI agents that can design, execute, and interpret experiments with minimal human oversight.

However, the industry's fundamental challenge remains the gap between computational prediction and biological reality. No amount of algorithmic sophistication can fully substitute for experimental validation, and the ultimate test of AI-designed drugs will be their performance in the clinic.

References

Jayatunga, M. K. P. et al. "How successful are AI-discovered drugs in clinical trials? A first analysis and emerging lessons." Drug Discovery Today 29, no. 6 (2024): 103999. ↩
Grand View Research. "Artificial Intelligence In Drug Discovery Market Report, 2033." grandviewresearch.com. ↩
Insilico Medicine. "First Generative AI Drug Begins Phase II Trials with Patients." insilico.com, 2023. ↩
Insilico Medicine. "Insilico Announces Nature Medicine Publication of Phase IIa Results of Rentosertib." insilico.com / Nature Medicine, June 2025. ↩
Abramson, J. et al. "Accurate structure prediction of biomolecular interactions with AlphaFold 3." Nature 630 (2024): 493-500. ↩
The Nobel Foundation. "The Nobel Prize in Chemistry 2024." nobelprize.org, October 2024. ↩
Isomorphic Labs. "The Isomorphic Labs Drug Design Engine unlocks a new frontier beyond AlphaFold." isomorphiclabs.com, February 2026. ↩
CNBC. "Inside Isomorphic Labs, the secretive AI life sciences startup spun off from Google DeepMind." cnbc.com, April 2025. ↩
Isomorphic Labs. "Isomorphic Labs announces $600m external investment round." isomorphiclabs.com, March 2025. ↩
Recursion Pharmaceuticals. "Pioneering AI Drug Discovery." recursion.com. ↩
Recursion. "Positive Phase 1b/2 Results from Ongoing REC-4881 TUPELO Trial." ir.recursion.com, 2025. ↩
Recursion. "Recursion Reports Interim Phase 1 Clinical Data for REC-617." ir.recursion.com, 2025. ↩
Exscientia / UKRI. "Exscientia: a clinical pipeline for AI-designed drug candidates." ukri.org. ↩
Relay Therapeutics. "Dynamo Platform." relaytx.com. ↩
AInvest. "Relay Therapeutics (RLAY): Is H2 2025 the Inflection Point for AI-Driven Drug Discovery?" ainvest.com, 2025. ↩
Takeda. "Takeda's Zasocitinib Landmark Phase 3 Plaque Psoriasis Data." takeda.com, December 2025. ↩
AJMC. "Zasocitinib Hits Phase 3 End Points in Plaque Psoriasis." ajmc.com, 2025. ↩
Schrodinger, Inc. "About us." schrodinger.com. ↩
Sun, D. et al. "Why 90% of clinical drug development fails and how to improve it?" Acta Pharmaceutica Sinica B 12, no. 7 (2022): 3049-3062. ↩
DiMasi, J. A. et al. "Innovation in the pharmaceutical industry: New estimates of R&D costs." Journal of Health Economics 47 (2016): 20-33. ↩
FDA. "Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products" (draft guidance). fda.gov, January 2025. ↩
Axis Intelligence. "AI Drug Discovery 2026: 173 Programs, FDA Framework & Market." axis-intelligence.com, 2026. ↩
Callaway, E. "Major AlphaFold upgrade offers boost for drug discovery." Nature (2024). ↩
Pun, F. W. et al. "A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation." Briefings in Bioinformatics 25, no. 4 (2024). ↩
NVIDIA Blog. "Drug Discovery, STAT! NVIDIA, Recursion Speed Pharma R&D With AI Supercomputer." blogs.nvidia.com, May 2024. ↩
Exelixis. "Exelixis and Insilico Medicine Enter into Exclusive Global License Agreement for ISM3091." ir.exelixis.com. ↩
Nature Communications. "An artificial intelligence accelerated virtual screening platform for drug discovery." (2024). ↩
"Leveraging machine learning models in evaluating ADMET properties for drug discovery and development." PMC (2025). ↩
Isomorphic Labs. "Accurate Predictions of Novel Biomolecular Interactions" (IsoDDE technical report). isomorphiclabs.com, February 10, 2026. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributors · full history

Suggest edit

What links here

AI Drug Discovery AI in Healthcare AlphaProteo Artificial intelligence applications Boltz Chai Discovery Chai-1 Chai-2 ESMFold Jakob Uszkoreit RoseTTAFold

What is the drug discovery pipeline and where does AI help?

Target identification and validation

Genomics and multi-omics integration

How did AlphaFold change structure-based drug design?

Molecule generation and de novo drug design

Generative chemistry

Multi-objective optimization

Virtual screening

Molecular docking

Ultra-large library screening

ADMET prediction

Clinical trial optimization

Patient recruitment

Adaptive trial design

Digital twins and synthetic control arms

Which companies lead AI drug discovery?

Isomorphic Labs

Insilico Medicine

Recursion Pharmaceuticals

Exscientia (now part of Recursion)

Relay Therapeutics

Schrodinger

What is AlphaFold's overall impact on drug discovery?

Do AI-designed drugs actually have higher success rates?

Challenges and limitations

Data quality and availability

Biological complexity

Regulatory landscape

Interpretability and reproducibility

How big is the AI drug discovery market?

Future outlook

References

Improve this article

Related Articles

ESM3

EvolutionaryScale

Isomorphic Labs

IsoDDE

Waypoint Bio

Insilico Medicine

What links here

Related Articles

ESM3

EvolutionaryScale

Isomorphic Labs

IsoDDE

Waypoint Bio

Insilico Medicine

What links here