AI in drug discovery refers to the application of artificial intelligence, machine learning, and deep learning techniques to accelerate the process of identifying, designing, and developing new pharmaceutical drugs. By leveraging computational methods at every stage of the drug development pipeline, AI has the potential to reduce the time, cost, and failure rate associated with bringing new therapies to market. As of 2026, over 173 AI-discovered drug programs are in clinical development, and the global AI in drug discovery market is projected to grow from roughly $2.9 billion in 2025 to over $10 billion by the end of the decade.
Traditional drug discovery is a long, expensive, and failure-prone process. Developing a single new drug from initial concept to regulatory approval typically takes 10 to 15 years and costs an estimated $2.6 billion on average, including the cost of failed programs. Approximately 90% of drug candidates that enter clinical trials fail before reaching approval, with the most common reasons being lack of clinical efficacy (40% to 50%), unmanageable toxicity (30%), poor drug-like properties (10% to 15%), and lack of commercial viability (10%).
The drug development pipeline consists of several major stages:
| Stage | Description | Typical timeline |
|---|---|---|
| Target identification | Finding a biological molecule (often a protein) involved in a disease that can be modulated by a drug | 1 to 2 years |
| Target validation | Confirming that modulating the target produces a therapeutic benefit | 1 to 2 years |
| Hit identification | Screening large compound libraries to find molecules that interact with the target | 6 months to 1 year |
| Lead optimization | Refining hit compounds to improve potency, selectivity, and drug-like properties | 1 to 3 years |
| Preclinical development | Laboratory and animal studies to assess safety and pharmacology | 1 to 2 years |
| Phase I clinical trials | Testing safety and dosing in healthy volunteers (20 to 100 participants) | 1 to 2 years |
| Phase II clinical trials | Evaluating efficacy and side effects in patients (100 to 300 participants) | 1 to 3 years |
| Phase III clinical trials | Confirming efficacy in large patient populations (1,000 to 5,000 participants) | 2 to 4 years |
| Regulatory review | FDA or EMA assessment of trial data for market approval | 0.5 to 2 years |
AI and machine learning can contribute to nearly every stage of this pipeline. In target identification, AI analyzes genomic and proteomic data to pinpoint disease-relevant proteins. In molecule generation and lead optimization, generative models design novel compounds with desired properties. In preclinical development, AI predicts absorption, distribution, metabolism, excretion, and toxicity (ADMET) profiles. In clinical trials, AI helps with patient recruitment, adaptive trial design, and endpoint prediction. The cumulative effect is a potential reduction in R&D costs of 25% to 40% and shortened development timelines by several years.
Target identification is the first and arguably the most consequential step in drug discovery. Choosing the wrong target leads to years of wasted effort and resources. AI helps by mining vast biological datasets to uncover novel therapeutic targets that would be difficult or impossible to identify through manual analysis alone.
Modern target discovery relies on the integration of multiple types of biological data, including genomics, transcriptomics, proteomics, and metabolomics. Machine learning models can analyze these multi-omics datasets to identify genes and proteins that play causal roles in disease processes. For example, large language models trained on biomedical literature can systematically analyze disease-associated biological pathways and surface potential targets by connecting genes, diseases, and compounds in knowledge graphs.
Insilico Medicine developed a target identification platform called PandaOmics, which integrates 22 multi-modal data sources to build disease-specific models. Using PandaOmics, Insilico identified TNIK (Traf2- and Nck-interacting kinase) as a novel regulator of lung fibrosis pathways, a target that had not been previously explored for idiopathic pulmonary fibrosis (IPF). This AI-identified target became the basis for rentosertib, the first AI-designed drug to reach Phase II clinical trials.
AlphaFold, developed by Google DeepMind, represents one of the most significant breakthroughs in computational biology. AlphaFold2, released in 2020 by Demis Hassabis and John Jumper, solved the long-standing "protein folding problem" by accurately predicting three-dimensional protein structures from amino acid sequences alone. The AlphaFold Protein Structure Database, launched in July 2021, has grown to contain predicted structures for over 200 million proteins from more than one million species. Previously, obtaining a single protein structure through experimental methods such as X-ray crystallography could take months or years; AlphaFold can produce a prediction in minutes.
In October 2024, Hassabis and Jumper were awarded the Nobel Prize in Chemistry for their work on protein structure prediction. David Baker shared the prize for his contributions to computational protein design.
AlphaFold 3, released in May 2024, extended the capabilities of its predecessor by predicting the joint structure of complexes that include proteins, DNA, RNA, small molecules, ions, and modified residues. The model achieves at least a 50% improvement over existing methods for predicting protein interactions with other molecule types, and for some categories of interaction, accuracy has doubled compared to previous state-of-the-art approaches. For drug discovery specifically, AlphaFold 3 can accurately predict binding sites and interaction energies, which accelerates both target identification and the design of molecules that bind effectively to their targets.
Isomorphic Labs, a spin-off from Google DeepMind focused on applying AlphaFold to drug discovery, announced in February 2026 a Drug Design Engine that doubled the performance of AlphaFold 3 on a protein-ligand structure prediction benchmark.
Once a target has been identified and validated, the next challenge is to find or create molecules that modulate it. Traditional high-throughput screening involves testing millions of compounds from physical libraries, a process that is expensive and limited to existing chemical space. AI-driven de novo drug design instead generates entirely new molecules from scratch, exploring a vastly larger chemical space.
Several generative model architectures have been adapted for molecular design. These include variational autoencoders (VAEs), generative adversarial networks, normalizing flows, and diffusion models. These models learn the statistical distribution of known drug-like molecules and can then sample from that distribution to generate novel compounds with specified properties.
A typical generative chemistry workflow involves encoding molecular structures as SMILES strings or molecular graphs, training a generative model on datasets of known bioactive compounds, and then sampling new molecules while optimizing for properties such as binding affinity, selectivity, solubility, and synthetic accessibility. Active learning cycles can iteratively refine the generated molecules using feedback from docking simulations, QSAR models, or even experimental assays.
Insilico Medicine's Chemistry42 platform uses generative models to design novel small molecules for a given target. Using this platform, Insilico designed rentosertib (ISM001-055) in a fraction of the time that traditional medicinal chemistry would require. The compound went from target identification to preclinical candidate nomination in approximately 18 months, compared to the industry average of 4 to 5 years.
Exscientia, a UK-based company that merged with Recursion Pharmaceuticals in November 2024, developed the Centaur AI platform, which combines generative design with automated experimentation. Exscientia reported that its platform reduced the drug design timeline from the industry average of 4.5 years to just 12 to 15 months while decreasing capital costs by up to 80%. Six molecules designed using Exscientia's AI entered clinical trials, including EXS21546, an A2A receptor antagonist for oncology discovered in just eight months.
A key challenge in drug design is that molecules must satisfy many conflicting requirements simultaneously. They need high potency against the target, selectivity over related proteins, favorable pharmacokinetic properties, low toxicity, and practical synthetic routes. AI methods excel at multi-objective optimization, using techniques such as Pareto optimization and reinforcement learning to navigate these trade-offs and identify molecules that represent the best compromises across all criteria.
Virtual screening uses computational methods to evaluate large libraries of compounds and identify those most likely to bind to a target protein. There are two main approaches: structure-based virtual screening, which uses the three-dimensional structure of the target, and ligand-based virtual screening, which uses information about known active molecules.
Molecular docking is the traditional workhorse of structure-based virtual screening. Docking programs predict how a small molecule fits into the binding site of a protein and estimate the strength of the interaction through a scoring function. However, traditional scoring functions often fail to accurately model the full complexity of protein-ligand interactions, leading to high false-positive rates.
AI-enhanced docking methods address these limitations. Machine learning scoring functions trained on large datasets of experimental binding data can better predict actual binding affinities. For instance, the Deep Docking (DD) platform enables up to 100-fold acceleration of structure-based virtual screening by docking only a subset of a chemical library and iteratively predicting the scores for remaining compounds using a ligand-based model. RosettaVS, developed for the Nature Communications platform, combines molecular dynamics with AI to model receptor flexibility and outperforms traditional methods on multiple benchmarks.
Recent advances have made it possible to screen virtual libraries containing billions of compounds. Atomwise's AtomNet platform uses deep learning and convolutional neural networks to analyze protein structures and predict drug-target interactions across a proprietary library of more than three trillion synthesizable compounds. By predicting docking scores with machine learning rather than performing full docking simulations, these platforms can evaluate chemical spaces that would be computationally prohibitive with traditional methods.
ADMET stands for Absorption, Distribution, Metabolism, Excretion, and Toxicity. These pharmacokinetic and safety properties determine whether a drug candidate can reach its target in the body at an effective concentration without causing unacceptable side effects. Poor ADMET properties are responsible for roughly 10% to 15% of clinical trial failures and have historically been a major cause of late-stage attrition.
Machine learning models for ADMET prediction have become increasingly sophisticated. Common approaches include:
| ADMET property | What it measures | Common ML approaches |
|---|---|---|
| Aqueous solubility | Whether the drug dissolves in body fluids | Random forest, graph neural networks |
| Intestinal permeability | Ability to cross the gut wall and enter the bloodstream | Support vector machines, gradient boosting |
| CYP inhibition | Interaction with cytochrome P450 liver enzymes | XGBoost, multitask neural networks |
| Plasma protein binding | Fraction of drug bound to blood proteins (unavailable for action) | Ensemble methods, deep learning |
| hERG channel inhibition | Cardiac toxicity risk | Random forest, attention-based models |
| Hepatotoxicity | Liver damage potential | Graph neural networks, transformer models |
Modern ADMET prediction frameworks increasingly use graph neural networks that operate directly on molecular graphs, capturing structural features that are relevant to pharmacokinetic behavior. Multitask learning, where a single model predicts multiple ADMET endpoints simultaneously, has shown improvements over single-task models because ADMET properties are often correlated. AutoML methods that automatically search for optimal model architectures and hyperparameters have further improved the reproducibility and performance of ADMET predictions.
Despite these advances, challenges remain. Training data for ADMET models is often limited, noisy, and biased toward certain chemical series. Predictions may not generalize well to novel chemical scaffolds, and the gap between in vitro measurements and in vivo outcomes remains significant.
AI is increasingly being applied beyond drug design to the clinical trial process itself. Clinical trials account for approximately 60% of total drug development costs and are the stage most prone to delays and failures.
Nearly 80% of clinical trials experience delays, with patient recruitment cited as one of the most common challenges. AI platforms can identify protocol-eligible patients up to three times faster by processing electronic health records, genetic profiles, and demographic information with approximately 93% accuracy. Natural language processing models parse unstructured medical records to match patients to eligibility criteria, while predictive models identify patients most likely to benefit from treatment and to remain compliant throughout the trial.
Adaptive clinical trials use accumulating data to modify aspects of the study without undermining its validity. Machine learning algorithms analyze interim results using reinforcement learning, decision trees, and neural networks to inform real-time protocol updates. These updates can include dosage adjustments, treatment arm modifications, and patient reallocation. AI-driven adaptive designs can reduce sample sizes, shorten trial durations, and increase the probability of detecting true treatment effects.
Digital twins, virtual representations of individual patients built from their clinical and molecular data, allow researchers to simulate patient-specific responses to treatment. Synthetic control arms use historical patient data modeled by AI to serve as comparators, potentially reducing the number of patients needed in control groups. Both approaches are being explored as ways to make trials more efficient and ethical, particularly for rare diseases where patient populations are small.
Isomorphic Labs was founded in 2021 as a spin-off from Google DeepMind, led by DeepMind founder Demis Hassabis. The company's mission is to apply AlphaFold and related AI technologies to drug discovery at scale. In January 2024, Isomorphic announced major partnerships with Eli Lilly and Novartis worth a combined total of nearly $3 billion. The Novartis partnership was expanded in February 2025 to include up to three additional research programs.
In March 2025, Isomorphic raised $600 million in a financing round led by Thrive Capital. As of mid-2025, the company was preparing to dose the first patients in clinical trials for AI-designed oncology drugs. In February 2026, Isomorphic unveiled its Drug Design Engine, which doubled the performance of AlphaFold 3 on protein-ligand structure prediction benchmarks.
Insilico Medicine was founded in 2014 by Alex Zhavoronkov at Johns Hopkins University. The company's end-to-end Pharma.AI platform includes PandaOmics for target identification, Chemistry42 for generative drug design, and InClinico for clinical trial outcome prediction. Since 2021, Insilico has nominated over 20 preclinical candidates across a portfolio of more than 30 assets and received IND clearance for 10 molecules.
Insilico's lead program, rentosertib (ISM001-055), is a TNIK inhibitor for idiopathic pulmonary fibrosis that was the first drug fully discovered and designed by generative AI to enter Phase II clinical trials. Phase IIa results published in Nature Medicine in June 2025 showed that patients receiving 60 mg once daily experienced a mean improvement in lung function (FVC) of +98.4 mL, compared to a decline of -20.3 mL in the placebo group. A Phase IIa trial in the U.S. is also ongoing.
Beyond rentosertib, Insilico's pipeline includes ISM3091, a USP1 inhibitor for solid tumors licensed to Exelixis, and ISM5939, an oral ENPP1 inhibitor that received FDA IND clearance in November 2024 for a first-in-human Phase 1a/b study in solid tumors.
Recursion was founded in 2013 in Salt Lake City, Utah. The company operates highly automated laboratories that run up to 2.2 million experiments per week, generating biological imaging data that feeds into the Recursion Operating System (Recursion OS). Over the past decade, Recursion has accumulated over 50 petabytes of proprietary biological and chemical data spanning phenomics, transcriptomics, proteomics, and ADME measurements.
In partnership with NVIDIA, Recursion built BioHive-2, a supercomputer packing 504 NVIDIA H100 Tensor Core GPUs that delivers 2 exaflops of AI performance. In November 2024, Recursion acquired Exscientia to create a vertically integrated AI drug discovery platform combining Recursion's phenomic screening with Exscientia's automated precision chemistry.
Key clinical programs include REC-4881 for familial adenomatous polyposis (FAP), which showed a median 43% reduction in polyp burden in Phase 1b/2 data, and REC-617, a CDK7 inhibitor for advanced solid tumors currently in Phase 1.
Exscientia, founded in the UK, pioneered the use of AI to design small-molecule drug candidates. Before its acquisition by Recursion, six molecules designed using Exscientia's AI entered clinical trials. Notable programs included EXS21546, the first AI-designed oncology immunotherapy molecule to enter human trials; DSP-0038, a dual 5-HT1a/5-HT2a modulator for Alzheimer's disease psychosis developed with Sumitomo Dainippon Pharma; and GTAEXS617, a CDK7 inhibitor in Phase 1/2 trials for advanced solid tumors.
Relay Therapeutics is a clinical-stage precision medicine company founded in Cambridge, Massachusetts. Its Dynamo platform integrates molecular dynamics simulations, cryo-electron microscopy, and machine learning to design drugs that target specific protein conformations. This approach enables the creation of mutant-selective inhibitors that target only the pathological form of a protein while minimizing off-target effects.
Relay's lead asset, RLY-2608, is a pan-mutant PI3K-alpha inhibitor in Phase 3 trials for HR+/HER2- metastatic breast cancer. Updated data from Q2 2025 showed a 10.3-month median progression-free survival and a 39% overall response rate in patients with PI3K-alpha-mutated tumors. The company also licensed RLY-4008 to Elevar Therapeutics in late 2024 in a deal worth up to $500 million and has a collaboration with Pfizer to explore combination therapies.
Schrodinger, Inc. was founded in 1990 in New York City and is a pioneer in physics-based computational chemistry for drug discovery. Rather than relying solely on data-driven machine learning, Schrodinger's platform uses first-principles physics simulations to predict how molecules interact. This approach, combined with machine learning, enables the company to evaluate billions of virtual compounds with high accuracy.
Schrodinger's most advanced clinical asset is zasocitinib (TAK-279), a highly selective allosteric TYK2 inhibitor developed in partnership with Takeda. Zasocitinib demonstrates more than one-million-fold selectivity for TYK2 over JAK1, JAK2, and JAK3. In 2025, Takeda announced positive Phase 3 results in plaque psoriasis, with more than half of patients achieving clear or almost clear skin (PASI 90) at week 16. Zasocitinib is also being evaluated in Phase 3 studies for psoriatic arthritis and Phase 2 studies for Crohn's disease and ulcerative colitis.
AlphaFold's impact on drug discovery extends beyond protein structure prediction. Before AlphaFold, experimental determination of protein structures was a bottleneck for structure-based drug design. Only a fraction of human proteins had experimentally resolved structures, and many disease-relevant targets lacked structural information entirely. AlphaFold's database of 200 million predicted structures has made structural information available for virtually every known protein, opening up previously intractable targets for drug development.
However, there are important limitations. AlphaFold2 predicts static structures and cannot fully capture the dynamic conformational changes that proteins undergo in biological environments. The predicted structure may not represent the form that binds a drug molecule. AlphaFold also does not directly predict binding affinities or druggability, meaning that additional computational and experimental work is still needed to translate structural predictions into drug candidates.
AlphaFold 3 addresses some of these limitations by modeling interactions between proteins and other biomolecules, including small-molecule ligands. Isomorphic Labs has demonstrated that AlphaFold 3 outperforms traditional docking tools for predicting protein-ligand structures, and pharmaceutical companies including Novartis and Eli Lilly are collaborating with Isomorphic to apply these capabilities to real-world drug design programs.
The number of AI-designed drug candidates entering clinical trials has grown rapidly, from 3 programs in 2016 to 67 in 2023 and over 173 as of early 2026. AI-discovered molecules have shown an 80% to 90% success rate in Phase I trials, significantly higher than the historical average of approximately 52% for traditionally discovered drugs. As of December 2025, no AI-discovered drug has received FDA approval, but the first approval is projected for 2026 or 2027.
The table below highlights notable AI-designed drugs that have reached clinical trials:
| Drug | Company | Target | Indication | Phase | AI role | Key data |
|---|---|---|---|---|---|---|
| Rentosertib (ISM001-055) | Insilico Medicine | TNIK | Idiopathic pulmonary fibrosis | Phase IIa | Target ID + molecule design | +98.4 mL FVC improvement vs. placebo |
| Zasocitinib (TAK-279) | Schrodinger / Takeda | TYK2 (allosteric) | Plaque psoriasis, psoriatic arthritis | Phase III | Physics-based compound design | >50% PASI 90 at week 16 |
| RLY-2608 | Relay Therapeutics | PI3K-alpha (pan-mutant) | HR+/HER2- breast cancer | Phase III | Molecular dynamics + ML design | 10.3-month median PFS, 39% ORR |
| ISM3091 | Insilico Medicine | USP1 | Solid tumors (BRCA-mutated) | Phase I | Generative design | Licensed to Exelixis |
| ISM5939 | Insilico Medicine | ENPP1 | Solid tumors | Phase I | Generative design | IND cleared November 2024 |
| REC-4881 | Recursion | MEK1/2 | Familial adenomatous polyposis | Phase 1b/2 | Phenomic screening + AI | 43% median polyp reduction |
| REC-617 | Recursion | CDK7 | Advanced solid tumors | Phase I | AI-driven discovery | 1 confirmed partial response |
| GTAEXS617 | Exscientia (Recursion) | CDK7 | Advanced solid tumors | Phase 1/2 | Generative AI design | Improved potency and selectivity |
| DSP-0038 | Exscientia / Sumitomo | 5-HT1a / 5-HT2a | Alzheimer's disease psychosis | Phase I | Dual-target AI design | Designed in collaboration |
| EXS21546 | Exscientia | A2A receptor | Solid tumors (immuno-oncology) | Phase 1/2 | AI design in 8 months | Discontinued after peer data |
AI models are only as good as the data they are trained on. In drug discovery, high-quality labeled data is scarce. Many biological assays are noisy, and results can vary across laboratories and experimental conditions. A survey of technology executives found that 68% cite poor data quality and governance as the primary reason AI initiatives fail. Data preparation alone consumes an estimated 80% of an AI project's time. Additionally, much of the most valuable data (clinical outcomes, proprietary compound activity) is held by pharmaceutical companies and is not publicly available, limiting the ability to train broadly generalizable models.
Biology is far more complex than current AI models can fully capture. Diseases involve intricate networks of interacting proteins, genes, and environmental factors. A molecule that looks promising in silico may fail in living systems due to off-target effects, unexpected metabolic pathways, or immune responses that no model predicted. The gap between computational predictions and clinical reality remains the fundamental challenge in AI-driven drug discovery.
In January 2025, the FDA published its first comprehensive draft guidance on the use of AI in drug development, titled "Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products." The guidance introduces a seven-step credibility assessment framework based on "context of use," requiring lifecycle maintenance plans and transparency about model architectures and training data. Different levels of documentation are required depending on risk level. The European Medicines Agency (EMA) is expected to publish parallel guidance by mid-2026.
The regulatory framework remains a work in progress. While the FDA has stated that AI-discovered drugs do not require a different approval pathway than traditionally discovered drugs, the use of AI in regulatory submissions raises questions about model interpretability, reproducibility, and accountability. More than 25% of warning letters issued by the FDA since 2019 have cited data accuracy issues, underscoring the importance of data governance in AI applications.
Many AI models, particularly deep neural networks, function as "black boxes" that provide predictions without transparent explanations of their reasoning. In a field where understanding the mechanism of action is critical for both scientific progress and regulatory approval, the lack of interpretability is a significant barrier. Efforts to develop explainable AI methods for drug discovery are ongoing but have not yet produced solutions that satisfy both computational performance and human interpretability requirements.
The AI in drug discovery market is growing rapidly, though estimates vary across research firms depending on scope and methodology:
| Source | Market size (2025) | Projected size | CAGR | Projection year |
|---|---|---|---|---|
| Grand View Research | $2.35 billion | $13.77 billion | 24.8% | 2033 |
| Precedence Research | $4.6 billion | $49.5 billion | 30.1% | 2034 |
| BioSpace | Not specified | $16.52 billion | Not specified | 2034 |
| SNS Insider | Not specified | $15.50 billion | 29.89% | 2032 |
North America dominated the market in 2025 with approximately 53% revenue share. The drug optimization and repurposing segment led by application with roughly 52% share. Key drivers include the rising demand for cost-effective drug development, the increasing number of clinical trials, and growing chronic disease prevalence. AI is estimated to reduce pharmaceutical R&D costs by 25% to 40% and shorten clinical trial timelines significantly.
Major technology companies have made substantial investments in the space. Alphabet invested in Isomorphic Labs, which raised $600 million in March 2025. NVIDIA has partnered with Recursion Pharmaceuticals on the BioHive-2 supercomputer. AMD invested $20 million in Absci in January 2025 to support AI-driven biologics discovery. These investments reflect the growing conviction that AI will become a foundational technology in pharmaceutical R&D.
The field of AI in drug discovery is at an inflection point. Multiple AI-designed drugs are in late-stage clinical trials, and the first FDA approval of an AI-discovered drug is widely anticipated in 2026 or 2027. The continued improvement of protein structure prediction models, generative chemistry platforms, and clinical trial optimization tools suggests that AI's role in drug development will only deepen.
Key trends to watch include the development of foundation models for biology that can integrate diverse data types into unified representations, the expansion of federated learning approaches that allow pharmaceutical companies to train models on pooled data without sharing proprietary information, and the emergence of autonomous AI agents that can design, execute, and interpret experiments with minimal human oversight.
However, the industry's fundamental challenge remains the gap between computational prediction and biological reality. No amount of algorithmic sophistication can fully substitute for experimental validation, and the ultimate test of AI-designed drugs will be their performance in the clinic.