rBio (CZI)
Last reviewed
Jun 7, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 · 1,711 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 7, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 · 1,711 words
Add missing citations, update stale details, or suggest a clearer explanation.
rBio (styled rbio1 in the accompanying preprint) is a biological reasoning large language model released by the Chan Zuckerberg Initiative (CZI) on August 21, 2025. Its distinguishing feature is the training method: instead of learning only from wet-lab measurements, rBio is post-trained with reinforcement learning in which the reward signal comes from "virtual cell" simulators and other biological prior knowledge rather than from new laboratory experiments. CZI describes it as the first reasoning model trained on predictions from virtual cell models, and the team frames the underlying technique as "soft verification," a way to distill the knowledge embedded in a simulator into a conversational model that can reason step by step about cell biology. The work sits inside CZI's broader virtual biology initiative and its Virtual Cells Platform, and is positioned as a proof of concept rather than a finished tool for biologists.
CZI, the philanthropy founded by Mark Zuckerberg and Priscilla Chan, has organized much of its science program around a long-term goal of building "virtual cell" models: AI systems that can predict and explain how cells behave, with the stated aim of helping researchers cure, prevent, or manage disease. The effort has two complementary parts. One is data and model generation, including the Billion Cells Project announced in February 2025 (a collaboration with 10x Genomics and Ultima Genomics to map genetic perturbations across roughly one billion cells) and an expanded GPU-infrastructure collaboration with NVIDIA announced in October 2025. The other is the Virtual Cells Platform (VCP), an open hub that distributes models, datasets, and benchmarks.
A central model in that program is TranscriptFormer, a generative single-cell transcriptomics model CZI introduced in April 2025. Its most evolutionarily diverse variant, TF-Metazoa, was trained on 112 million cells spanning 12 species across about 1.5 billion years of evolution, and it can perform zero-shot tasks such as cell-type classification and prediction of gene-gene regulatory relationships. rBio is the first CZI system to use a model like TranscriptFormer not as an endpoint but as a training signal for a separate reasoning model. The Arc Institute's State model is another example of the broader virtual cell model category that such a framework could draw on.
rBio is a conversational reasoning model for cell biology. A scientist can pose a natural-language question such as "Would suppressing the actions of gene A result in an increase in activity of gene B?" and the model returns a prediction with step-by-step reasoning, for example describing whether a perturbation is likely to shift a cell from a healthy toward a diseased state. This places it in the family of generative AI systems aimed at AI for science and AI in healthcare.
The conceptual core is what the preprint calls reasoning from simulators, or "soft verification." Modern reasoning models are usually trained against exact verifiers: a math answer or a unit of code is either right or wrong, so a binary reward is available at scale. Biology has no such exact oracle, and the usual fallback, running real experiments, is slow, costly, and does not scale with computation. The authors argue that a trained model of biology can instead act as an approximate oracle. Because such a model outputs probabilities rather than certainties, the reward is made continuous: the model receives credit in proportion to how likely its answer is to be correct according to the simulator, rather than a hard yes or no. CZI's stated motivation is that questions like "2 + 2 = ?" have unambiguous answers, while questions like whether a drug will cure a particular cancer carry irreducible uncertainty that a graded reward can represent.
rBio is post-trained from an off-the-shelf base model, Qwen2.5-3B-Instruct (a 3-billion-parameter decoder-only transformer), using reinforcement learning with the GRPO algorithm. The preprint defines two training paradigms. RLEMF (reinforcement learning with experimental model feedback) uses a learned model trained on experimental data as the verifier. RLPK (reinforcement learning from prior knowledge) uses a knowledge source such as an ontology. Across released variants, CZI uses several verifier types: a task-specific MLP, signals from TranscriptFormer (specifically pointwise mutual information, or PMI, scores), the Gene Ontology knowledge base, and, for comparison, experimental data used as a conventional "hard" verifier. These correspond to model variants documented on the platform and GitHub, including rbio1-MLP, rbio1-TF, rbio1-GO, rbio1-EXP, and combinations such as rbio1-TF+GO+MLP+EXP.
Training first targets perturbation prediction, drawing gene co-expression and gene regulatory information out of TranscriptFormer. According to the model card on CZI's Virtual Cells Platform, each variant was trained for roughly 100,000 steps over about 10 days on 8 NVIDIA H100 GPUs. The released code applies the Qwen Research License to the base model and an MIT license to CZI's own contributions; ten model variants are distributed via AWS S3, alongside a quick-start guide, a tutorial, and the bioRxiv preprint.
rBio is evaluated on PerturbQA, a benchmark introduced in the ICLR 2025 paper "Contextualizing biological perturbation experiments through language," which frames perturbation modeling as question answering over tasks such as differential-expression prediction and direction-of-change prediction (building on the perturbation-prediction task originally posed by the GEARS method). The held-out evaluation uses test splits from the K562, RPE1, HEPG2, and Jurkat cell lines.
CZI reports that rBio achieves state-of-the-art performance on PerturbQA. In the company's own framing, the first version "outperforms previously published models on the PerturbQA benchmark like SUMMER (ICLR 2025), baseline LLMs like Qwen2.5, and matches a strongly performant rBio ablation directly trained on experimental data when using chain-of-thought." The significance of that last comparison is the headline claim: a model taught by a simulator can match a counterpart taught on real experimental data, which is what supports the argument that simulators can substitute for some wet-lab training. All of these are performance claims made by the authors in a preprint that, as of early 2026, had not completed peer review, so they should be read as reported rather than independently confirmed. CZI itself presents rBio as a proof of concept.
rBio's contribution is methodological more than it is a single benchmark number. It proposes a route to scale scientific reasoning models in domains that lack formal verifiers, by treating an existing predictive model as a graded reward source and distilling its competence into a language model that can be queried conversationally. If the approach generalizes beyond perturbation prediction, it would let researchers improve reasoning models using computation rather than additional experiments, and let teams iterate on hypotheses in silico before committing to costly assays. CZI senior research scientist Ana-Maria Istrate has said publicly that such virtual simulations could save large sums of money and shorten discovery timelines, a forward-looking claim about potential impact rather than a measured result.
The limitations are inherent to the design. A soft verifier can only be as good as the simulator behind it, so any biases or gaps in TranscriptFormer or the chosen knowledge base propagate into rBio. The base model is small (3B parameters), the evaluation centers on one benchmark and a handful of cell lines, and the work has not yet been peer reviewed. Within those bounds, rBio is best understood as an early, openly released demonstration of "reasoning from simulators" inside CZI's virtual cell program.
| Item | Detail |
|---|---|
| Name | rBio (preprint name rbio1), version v1.0 |
| Developer | Chan Zuckerberg Initiative (CZI), AI/science group |
| Announced | August 21, 2025 (model v1.0 and preprint posted August 20, 2025) |
| Project leads | Theofanis Karaletsos (Senior Director of AI); Ana-Maria Istrate (Senior Research Scientist) |
| Preprint authors | A. Istrate, F. Milletari, F. Castrotorres, J. Tomczak, M. Torkar, D. Li, T. Karaletsos |
| Base model | Qwen2.5-3B-Instruct (3B parameters, decoder-only transformer) |
| Training | Reinforcement learning with GRPO; "soft verification" with proportional rewards |
| RL paradigms | RLEMF (experimental model feedback) and RLPK (reinforcement learning from prior knowledge) |
| Verifiers used | TranscriptFormer (PMI scores), Gene Ontology, task-specific MLP, experimental data (hard verifier) |
| Compute | ~100,000 steps, ~10 days on 8 NVIDIA H100 GPUs per variant |
| Benchmark | PerturbQA (perturbation prediction); test splits from K562, RPE1, HEPG2, Jurkat |
| Reported result | State-of-the-art on PerturbQA; beats SUMMER (ICLR 2025) and Qwen2.5 baselines; matches an experimental-data ablation with chain-of-thought |
| Licensing | Qwen Research License (base model); MIT License (CZI contributions); 10 variants released |
| Venue | bioRxiv preprint (DOI 10.1101/2025.08.18.670981); presented at a NeurIPS 2025 workshop |