GAIA-3 (Wayve)
Last reviewed
May 16, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 · 3,270 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 · 3,270 words
Add missing citations, update stale details, or suggest a clearer explanation.
GAIA-3 is a generative world model for autonomous driving developed by Wayve, launched on 2 December 2025 as the third generation in the company's GAIA family. The model has fifteen billion parameters, was trained with roughly five times the compute and ten times the data of its predecessor GAIA-2, and is built on a latent diffusion architecture extended for spatio-temporally coherent multi camera video. GAIA-3 is positioned by Wayve not primarily as a video synthesis system but as the foundation for an end to end offline evaluation pipeline for self driving software, capable of generating, simulating and scoring counterfactual driving scenarios that closely correlate with on road outcomes.
Where the earlier GAIA-1 and GAIA-2 models advanced the visual realism, controllability and geographic coverage of synthetic driving footage, GAIA-3 is framed by Wayve as a transition from world models as creative tools to world models as instruments of measurement. Chief Scientist Jamie Shotton described the release as moving world modelling from visual synthesis to true autonomy evaluation and validation. Wayve reports that internal studies have shown a fivefold reduction in the rate at which synthetic test sequences are rejected as unrealistic by downstream evaluation pipelines, and that policy rankings produced inside GAIA-3 strongly correlate with rankings produced from physical road testing.
The launch is closely connected to Wayve's broader UK strategy. The company is a partner on the UK government funded DriveSafeSim project, led with the Warwick Manufacturing Group at the University of Warwick, which is examining how generative world models such as GAIA-3 can underpin formal safety evaluation of automated driving systems. The release also sits inside an expanding policy and commercial relationship between Wayve and the British state, formalised in a memorandum of understanding between Wayve and the UK Department for Business and Trade signed on 12 May 2026.
Wayve has used the GAIA name since 2023 for its family of generative driving world models. The series is one of the most visible attempts in industry to apply diffusion models and related video generation techniques to the problem of producing controllable, sensor accurate driving footage that can be used both as synthetic data and as an interactive simulator. GAIA-3 is the third public generation and the first explicitly engineered around an evaluation use case rather than around data augmentation or pure visualisation.
The lineage of the GAIA series spans roughly two and a half years and shows steady increases in scale, geographic coverage and conditioning capability.
| Generation | Public release | Scale | Training data | Primary positioning |
|---|---|---|---|---|
| GAIA-1 | June 2023 preview, September 2023 technical report | Roughly nine billion parameters in the report, scaled from a one billion parameter June prototype | About 4,700 hours of proprietary London driving data collected between 2019 and 2023 | Proof of concept generative world model conditioned on video, text and action |
| GAIA-2 | 26 March 2025 | Multi billion parameter latent diffusion model with extensive domain conditioning | UK, United States and Germany driving data, multiple vehicles | Controllable multi view generative world model for assisted and automated driving |
| GAIA-3 | 2 December 2025 | Fifteen billion parameter latent diffusion model with a video tokenizer twice the size of GAIA-2 | Roughly ten times more data than GAIA-2, spanning nine countries across three continents | Foundation for safety focused offline evaluation, from simulation to evaluation |
The naming and the broader product family are also tied to Wayve's other foundation models, in particular LINGO-2, the vision language action model used to narrate and interrogate driving behaviour. Wayve has publicly positioned GAIA and LINGO as complementary components inside a multi model architecture, where GAIA acts as a simulation and evaluation engine and LINGO supplies natural language reasoning over driving decisions. A core driving policy network sits alongside both and is the system actually under test inside GAIA-3 generated worlds.
GAIA-3 was announced on 2 December 2025 through a press release on the Wayve website and a longer engineering write up published the same day under the title GAIA-3, Scaling World Models to Power Safety and Evaluation. The announcement framed GAIA-3 as a generative world model designed to accelerate the evaluation and validation of autonomous driving AI, and described the underlying engineering as a scale up of GAIA-2 along three axes, model size, training compute and training data.
Wayve's own materials state that GAIA-3 has fifteen billion parameters, is built on latent diffusion with a video tokenizer twice the size of the one used in GAIA-2, was trained with about five times more compute than GAIA-2 and on about ten times more data than GAIA-2. The training corpus is described as spanning nine countries across three continents, with explicit emphasis on safety critical elements such as pedestrians, cyclists, signs and traffic infrastructure.
In the launch materials, Jamie Shotton, Chief Scientist at Wayve, summarised the strategic shift as the system learning to recreate the dynamics of real world environments, from everyday traffic to rare events, and as advancing world modelling from visual synthesis to true autonomy evaluation and validation. Aniruddha Kembhavi, Director of Science Strategy, framed GAIA-3 as evidence that the same world modelling approach could in time be used to evaluate broader embodied AI systems including warehouse robotics, household humanoids and manufacturing automation, by surfacing rare and risky situations early and characterising failure modes.
GAIA-3 is described by Wayve as a latent diffusion based generative world model. The pipeline encodes driving video into a learned latent space using a video tokenizer, runs a diffusion process inside that latent space conditioned on a set of structured driving inputs, and decodes the result back into multi camera video that is intended to be coherent in space and in time. The architecture is a scale up of the GAIA-2 design rather than a clean break from it.
The most directly reported architectural deltas between GAIA-2 and GAIA-3 are summarised in the table below.
| Property | GAIA-2 | GAIA-3 | Change |
|---|---|---|---|
| Total parameters | About half of GAIA-3 | Fifteen billion | Roughly doubled |
| Video tokenizer scale | Baseline | Twice the size of GAIA-2 | Doubled |
| Training compute | Baseline | About five times GAIA-2 | Five times |
| Training data volume | Baseline | About ten times GAIA-2 | Ten times |
| Geographic coverage | UK, United States, Germany | Nine countries across three continents | Expanded |
| Multi view consistency | Multi camera output | Spatio temporally coherent multi camera output over longer trajectories | Extended duration and stability |
| Conditioning inputs | Video, text and action conditioning, scene parameters | Adds world on rails counterfactual conditioning and embodiment transfer | New conditioning modes |
A larger video tokenizer allows the model to represent fine grained spatio temporal structures more faithfully, which Wayve frames as a precondition for using generated footage as a substrate for measurement rather than illustration. In practice, this is reflected in sharper visual output, more consistent lighting across frames, richer texture detail and notably better rendering of road signage with readable text and of landmark architecture.
GAIA-3 retains the multi view generation capability that was central to GAIA-2 and extends it. Generated sequences are spatio temporally coherent across multiple cameras, agents maintain plausible behaviour through temporary occlusions, brake lights and indicators behave consistently with the underlying scene state, and incidental dynamics such as foliage movement, shadows and pedestrian motion are described as more realistic.
A central engineering claim of GAIA-3 is that the model accepts a richer set of structured conditioning inputs than previous Wayve world models, which is what allows it to function as an evaluation tool rather than only as a generator. The conditioning surface combines GAIA-2 style controls with two new families of conditioning intended specifically for counterfactual analysis and cross platform validation.
| Conditioning family | Description | Inherited or new |
|---|---|---|
| Video conditioning | Generation seeded or extended from real recorded driving footage | Inherited from GAIA-1 and GAIA-2 |
| Text and scene description | Natural language and structured scene parameters such as weather, time of day and location | Inherited from GAIA-1 and GAIA-2 |
| Action conditioning | Ego vehicle trajectory and control inputs that drive the evolution of the generated scene | Inherited from GAIA-1 and GAIA-2 |
| World on rails | Ego trajectory is varied while other agents, lighting, weather and overall world state remain consistent with the seed recording | New in GAIA-3 |
| Embodiment transfer | Scenes are re rendered from new sensor configurations using small unpaired samples from a target vehicle rig | New in GAIA-3 |
| Visual diversity controls | Appearance attributes such as lighting, weather and textures are varied while preserving underlying geometry and motion | Extended in GAIA-3 |
The world on rails conditioning mode is central to how Wayve frames the model as an evaluation system. By holding the rest of the world fixed and only varying the ego vehicle's trajectory, GAIA-3 can produce what if variants of a real recorded scene in which all other elements of the world behave identically. This is the property that makes the generated counterfactuals usable for measuring policy behaviour, because differences in outcome can be attributed to differences in policy decisions rather than to changes in the surrounding world.
Embodiment transfer addresses a different problem, namely that the sensor configuration of the vehicle used to record source data is rarely the configuration of the vehicle whose policy is being evaluated. Wayve describes embodiment transfer as the ability to take generated scenes and re render them from the camera layout of a different rig using only small unpaired samples from that target rig, removing the need for paired captures across vehicles.
The slogan attached to the launch is from simulation to evaluation. In practice this refers to a pipeline in which a single recorded driving sequence can be expanded into a structured test family by combining the conditioning inputs above, and in which a candidate driving policy can then be exercised inside the resulting scenes and scored against well defined criteria.
Wayve describes four primary evaluation primitives produced by GAIA-3.
| Primitive | Purpose | How it uses GAIA-3 |
|---|---|---|
| Safety critical scenarios | Generate counterfactual collision and near miss situations | Modify ego trajectory under world on rails conditioning to produce dangerous variants of real scenes |
| Offline evaluation suites | Convert single recordings into structured, repeatable test families | Combine action, scene and appearance conditioning to span a defined coverage space |
| Robustness testing | Probe sensitivity to nuisance variation | Hold geometry and motion fixed while varying lighting, weather and textures via visual diversity controls |
| Enrichment and debugging | Amplify rare failure modes into focused test sets | Generate labelled variations of known failures for targeted regression analysis |
The shift in emphasis is significant inside the wider world models field. Earlier Wayve world models were largely judged on the realism and diversity of the videos they produced, with synthetic data augmentation as the canonical downstream use. GAIA-3 is built around the assumption that the consumer of its output is another model rather than a human reviewer, and that the value of the output is determined by whether driving decisions made inside generated worlds predict the decisions a system would make on real roads.
Wayve has reported two top level claims about how that consumer model behaves. The first is that the rate at which generated test sequences are rejected by downstream evaluation pipelines, on the grounds of being unrealistic or implausible, has been reduced by a factor of five compared with prior generations. The second is that policy outcomes measured inside GAIA-3 generated scenes closely correlate with policy outcomes observed during on road testing. Both claims are reported by Wayve and have not, at time of writing, been independently reproduced by external researchers.
A recurring concern with generative world models is that consistency between generated frames does not by itself imply spatial realism. To address this, Wayve describes a validation methodology built around real world LiDAR data. When the ego trajectory is modified using world on rails conditioning, the corresponding real LiDAR point clouds are aligned with the generated frames to verify that the underlying spatial structure of the scene has been preserved across the counterfactual.
Wayve also describes correlation studies between synthetic interventions inside GAIA-3 and on road experiments, which it reports as evidence that the system is a reliable predictor of relative policy performance. The published material does not detail specific benchmark scores or formal statistical results from these studies, and external benchmark numbers should not be inferred beyond what Wayve has published.
GAIA-3 is closely connected to Wayve's UK based safety research programme. The company is a partner on DriveSafeSim, a UK government funded project led with the Warwick Manufacturing Group at the University of Warwick. DriveSafeSim's stated objective is to validate whether generative world models such as GAIA-3 can be used as part of formal safety evaluation for automated driving systems, and to develop methodology for using them in that role.
The broader relationship between Wayve and the UK government was deepened on 12 May 2026 with the signing of a memorandum of understanding between Wayve and the UK Department for Business and Trade. The agreement covers research collaboration on next generation self driving technology, safety assurance and simulation at scale, sharing of trial insights with government and regulators, and strengthening of UK automotive manufacturing and supply chains in AI, systems integration and advanced hardware. Business Secretary Peter Kyle described the agreement as accelerating self driving technology while anchoring jobs in the UK, and Science and Technology Secretary Liz Kendall framed it as securing high skilled technology and advanced manufacturing employment. Wayve Chief Executive Alex Kendall stated that strengthening domestic capabilities would anchor high value manufacturing in the UK and create thousands of skilled jobs.
GAIA-3 itself was not the subject of the MoU, but the partnership treats simulation at scale as one of the named work areas, which places GAIA-3 inside the policy conversation about how automated driving systems should be evaluated and licensed in the United Kingdom.
GAIA-3 is one of several generative models that Wayve has positioned as part of a multi model architecture for embodied AI. The GAIA family supplies a simulation and evaluation engine, the LINGO family supplies a vision language action layer for explaining and interrogating driving behaviour, and a separate core driving model produces the actual control outputs used on the road.
| Model family | Role | Most recent public version | Notes |
|---|---|---|---|
| GAIA | Generative world model for simulation and evaluation | GAIA-3, 2 December 2025 | Used to generate counterfactual driving scenes for offline evaluation |
| LINGO | Vision language action model | LINGO-2 | Connects natural language reasoning with driving actions |
| Core driving model | Production driving policy | Not separately versioned in public material | The model under test inside GAIA-3 generated scenes |
Wayve has framed this division of labour as a deliberate alternative to a single monolithic foundation model for self driving, on the basis that each function benefits from specialised conditioning and that the components can evolve at different rates.
GAIA-3 was reported by automotive trade publications including Automotive World, MarkLines, S&P Global Mobility's Automotive Technology Insight, Self Drive News and EV Mag, as well as by enterprise technology outlets including diginomica. Coverage focused on three themes, the framing of GAIA-3 as an evaluation platform rather than a data augmentation system, the scale jump from GAIA-2 and the connection to UK government backed safety validation work through DriveSafeSim.
Most third party reporting closely mirrors Wayve's own technical claims, including the fifteen billion parameter count, the fivefold reduction in synthetic test rejection rate and the partnership with the Warwick Manufacturing Group. Independent benchmarks of GAIA-3 against competing driving world models have not been published, and external coverage at the time of launch did not include quantitative comparisons with other generative simulation systems used inside the autonomous driving industry.
In parallel with the GAIA-3 release, Wayve secured an additional sixty million dollar investment from AMD, Arm and Qualcomm as an extension of its Series D round, which several outlets described as reinforcing the company's strategy of partnering closely with silicon vendors as it scales its training and inference infrastructure.
The public information available about GAIA-3 is largely sourced from Wayve itself and from press coverage of its launch. Several aspects of the system are not described in detail in public material. Wayve has not published a full technical report at the level of detail it released for GAIA-1 and GAIA-2, and there is no public information on items such as detailed model architecture beyond the latent diffusion description, exact training set composition, evaluation metrics with numerical scores, or independently reproduced correlation studies between GAIA-3 outcomes and on road outcomes.
The central methodological claim, that policy performance measured inside GAIA-3 reliably predicts policy performance on real roads, depends on the validity of the underlying world model and on the breadth of scenarios in which the correlation holds. Independent evaluation of this claim, including across geographies and weather conditions not heavily represented in Wayve's own training data, is a likely subject of future research and of the DriveSafeSim programme.