AI weather forecasting refers to the use of machine learning and deep learning models to predict atmospheric conditions, replacing or augmenting the physics-based numerical weather prediction (NWP) systems that have been the standard approach since the mid-20th century. Between 2022 and 2025, a rapid succession of AI models demonstrated that data-driven approaches trained on decades of historical weather observations can match or outperform the world's best operational forecasting systems while running thousands of times faster. This shift represents one of the most significant transformations in the history of meteorological science.
Modern weather forecasting has traditionally relied on numerical weather prediction, a method that simulates the atmosphere by solving the fundamental equations of fluid dynamics, thermodynamics, and radiative transfer on high-resolution grids. The European Centre for Medium-Range Weather Forecasts (ECMWF) operates the Integrated Forecasting System (IFS), widely regarded as the most accurate global NWP system. The IFS high-resolution deterministic model (HRES) produces 10-day forecasts at approximately 9 km horizontal resolution with 137 vertical levels, while the ensemble forecast system (ENS) generates 51 ensemble members to capture forecast uncertainty.
Running these physics-based simulations requires enormous computational resources. The ECMWF operates a supercomputing facility consisting of four Atos BullSequana XH2000 clusters with a combined performance of approximately 30 petaflops. Producing a single 10-day global forecast demands billions of calculations across millions of grid points, with the full operational forecast suite consuming significant fractions of this capacity. National weather services around the world collectively spend hundreds of millions of dollars annually on supercomputing infrastructure for NWP alone.
Despite steady improvements over many decades, NWP systems face fundamental constraints. Increasing resolution requires exponentially more computation, and the chaotic nature of the atmosphere means that small errors in initial conditions grow rapidly over time, limiting the practical forecast horizon. These limitations motivated researchers to explore whether data-driven approaches could learn atmospheric dynamics directly from historical observations.
Nearly all major AI weather forecasting models are trained on ERA5, the fifth-generation global atmospheric reanalysis dataset produced by the Copernicus Climate Change Service at ECMWF. ERA5 provides hourly estimates of a large number of atmospheric, land, and oceanic climate variables on a 31 km (0.25 degree) grid, resolving the atmosphere using 137 levels from the surface up to 80 km in altitude. The dataset covers the period from January 1940 to the present, though most AI models use data from 1979 onward, when satellite observations became available and reanalysis quality improved substantially.
ERA5 was created through data assimilation, a process that combines physics-based model simulations with millions of weather observations from surface stations, radiosondes, aircraft, and satellites to produce a physically consistent estimate of the state of the atmosphere at every time step. This means that AI weather models are not trained on raw observations but on a carefully curated, gridded product that already encodes significant physical knowledge.
The dataset includes variables at multiple pressure levels (such as temperature, wind components, geopotential height, and specific humidity) as well as surface variables (such as mean sea level pressure, 2-meter temperature, and 10-meter wind speed). A typical AI weather model uses between 5 and 7 atmospheric variables at 13 to 37 pressure levels, plus 4 to 7 surface-level variables, resulting in hundreds of input and output channels per forecast time step.
AI weather models approach forecasting as a supervised learning problem. Given the current state of the atmosphere (and often the state six hours earlier), the model learns to predict the atmospheric state at a future time, typically six or twelve hours ahead. Longer forecasts are produced autoregressively, meaning the model feeds its own prediction back as input to generate the next time step, repeating this process to extend forecasts out to 10 or 15 days.
Several different neural network architectures have proven effective for this task:
Vision Transformers (ViTs) treat the atmospheric state as an image, dividing the global grid into patches that are processed as tokens through self-attention layers. FourCastNet pioneered this approach using Adaptive Fourier Neural Operators (AFNOs), which replace standard attention with operations in the Fourier domain, allowing the model to efficiently capture global spatial dependencies. Pangu-Weather extended this idea with a 3D Earth-Specific Transformer that processes atmospheric data across both horizontal and vertical dimensions.
Graph neural networks (GNNs) represent the atmosphere as a graph, where nodes correspond to geographic locations and edges encode spatial relationships. GraphCast uses a multi-mesh approach based on a refined icosahedron, which provides roughly uniform coverage of the globe (unlike latitude-longitude grids, which oversample the poles). The encoder-processor-decoder architecture maps from the regular input grid to the icosahedral mesh, processes information through multiple message-passing layers, and maps back to the output grid.
Diffusion models generate probabilistic forecasts by learning to iteratively denoise random samples. GenCast adapts this approach to spherical geometry, generating ensemble members by conditioning the denoising process on the current atmospheric state. Each ensemble member represents a plausible future weather trajectory, allowing the model to capture forecast uncertainty natively.
NeuralGCM takes a different path by combining a differentiable physics-based dynamical core with learned neural network parameterizations. Rather than replacing physics entirely, it uses neural networks to handle the processes that traditional models parameterize crudely (such as convection and cloud formation), while retaining the equations of motion for large-scale atmospheric dynamics.
The following sections describe the most influential AI weather forecasting models developed between 2022 and 2025, listed in approximate chronological order of their initial publication.
FourCastNet (Fourier Forecasting Neural Network) was developed by researchers at NVIDIA, Lawrence Berkeley National Laboratory, the California Institute of Technology, the University of Michigan, and Rice University. The initial preprint appeared on arXiv in February 2022, making it one of the earliest AI models to demonstrate competitive performance with operational NWP at global scale.
The model employs a vision transformer backbone with Adaptive Fourier Neural Operators (AFNOs), which apply token mixing in the Fourier domain rather than through standard attention mechanisms. Input variables on the 720 x 1440 latitude-longitude grid are divided into patches of size 8 x 8, with each patch embedded as a high-dimensional token. The AFNO architecture was specifically chosen for its ability to handle the global, periodic nature of atmospheric data.
FourCastNet takes 20 ERA5 variables as input, defined on single and pressure levels, and predicts these same variables at a 6-hour time step with 0.25-degree resolution. The model was trained on ERA5 data from 1979 to 2015. In terms of speed, FourCastNet generates a week-long forecast in less than 2 seconds, compared to hours for traditional NWP systems. A notable achievement was training on 3,072 GPUs at the JUWELS Booster supercomputer with a training time of approximately 67 minutes.
While FourCastNet did not fully match ECMWF HRES on all metrics at the time of release, it demonstrated that data-driven global weather prediction was feasible at operational resolution and paved the way for subsequent models.
Pangu-Weather, developed by researchers at Huawei Cloud, was published in Nature on July 5, 2023, under the title "Accurate medium-range global weather forecasting with 3D neural networks." It was the first AI model to comprehensively outperform traditional NWP on standard evaluation metrics.
The architecture introduces a 3D Earth-Specific Transformer (3DEST) that processes atmospheric data across both horizontal and vertical dimensions simultaneously. Built on a hierarchical encoder-decoder framework derived from the Swin Transformer, Pangu-Weather includes an Earth-Specific positional Bias (ESB) term that encodes latitude and altitude-dependent patterns. This positional bias accounts for the fact that weather dynamics are strongly influenced by absolute position on Earth (for example, the Coriolis effect varies with latitude).
The system trains four separate models for different lead times: 1 hour, 6 hours, 12 hours, and 24 hours. Forecasts at intermediate or longer lead times are composed by chaining the appropriate models. Each model has approximately 64 million trainable parameters, with the total across all four models reaching about 256 million parameters. Training required approximately 73,000 GPU-hours on NVIDIA V100s per lead-time model.
Pangu-Weather operates at 0.25-degree resolution (approximately 25 km at the equator) and predicts 5 upper-air variables at 13 pressure levels plus 4 surface variables. In testing, it outperformed ECMWF HRES on all variables across all lead times from 1 hour to 7 days. It can produce a 10-day forecast in approximately 10 seconds on a single GPU, compared to hours on a supercomputer cluster of thousands of processors for traditional NWP.
Pangu-Weather was made publicly available on the ECMWF website and demonstrated practical value in tracking tropical cyclones, including successfully predicting the track of Typhoon Mawar in May 2023.
GraphCast was developed by Google DeepMind and published in Science on November 14, 2023, under the title "Learning skillful medium-range global weather forecasting." It quickly became one of the most widely cited AI weather models.
GraphCast uses an encoder-processor-decoder graph neural network architecture with 36.7 million parameters. The encoder maps input data from the 0.25-degree latitude-longitude grid onto a multi-mesh icosahedral graph, which is constructed by iteratively refining a regular icosahedron six times, dividing each triangle into four smaller ones at each refinement. This mesh provides roughly uniform spatial coverage of the globe. The processor consists of a deep GNN with 16 message-passing layers that captures long-range dependencies across the mesh. The decoder maps the processed mesh representation back to the output grid.
The model takes as input the two most recent atmospheric states (current and six hours prior) and predicts the state six hours ahead. It operates on 5 surface variables and 6 atmospheric variables at 37 pressure levels, totaling 227 input and output variables per grid point. GraphCast was trained on 39 years of ERA5 data (1979 to 2017) over approximately four weeks using 32 Google Cloud TPU v4 devices.
In a comprehensive evaluation, GraphCast outperformed ECMWF HRES on more than 90% of 1,380 verification targets (combinations of variables, levels, and lead times). When restricted to the troposphere (the lowest 6 to 20 km of the atmosphere, where most weather occurs), this figure rose to 99.7%. GraphCast produces a 10-day global forecast in under one minute on a single Google Cloud TPU, representing a speed advantage of several orders of magnitude.
GraphCast also demonstrated skill in predicting severe weather events. In September 2023, a live deployment of GraphCast on the ECMWF website predicted approximately nine days in advance that Hurricane Lee would make landfall in Nova Scotia, while conventional forecasts only converged on Nova Scotia about six days before landfall.
NeuralGCM was developed by Google Research in partnership with ECMWF and published in Nature on July 22, 2024, under the title "Neural general circulation models for weather and climate." Unlike the purely data-driven models described above, NeuralGCM is a hybrid system that integrates machine learning with traditional atmospheric physics.
The model combines two core components: a differentiable dynamical core and a learned physics module. The dynamical core solves the hydrostatic primitive equations (the fundamental equations governing large-scale atmospheric flow) using a pseudo-spectral discretization, implemented entirely in JAX to enable automatic differentiation. The neural network component replaces traditional parameterizations of sub-grid-scale processes such as convection, radiation, and turbulent mixing.
Both components produce tendencies (rates of change) that are integrated forward in time by an implicit-explicit ordinary differential equation solver. Because the entire system is differentiable, it can be trained end-to-end through backpropagation across multiple simulation time steps, with the rollout length gradually increasing from 6 hours to 5 days during training.
NeuralGCM operates at multiple resolutions, with its deterministic model achieving competitive performance at 0.7 degrees and its stochastic (ensemble) version running at 1.4 degrees. At 0.7-degree resolution, NeuralGCM matches the forecast accuracy of state-of-the-art models for lead times up to 5 days and performs comparably to ECMWF forecasts up to 15 days. At 1.4-degree resolution (approximately 140 km), the model generates realistic tropical cyclone statistics and captures emergent large-scale phenomena.
A key advantage of NeuralGCM is its ability to produce physically consistent simulations over extended time periods, making it suitable not only for weather forecasting but also for climate projections spanning decades. The model runs entirely on GPUs and TPUs, delivering results roughly 100,000 times faster than comparable traditional climate models.
GenCast was developed by Google DeepMind and published in Nature on December 4, 2024, under the title "Probabilistic weather forecasting with machine learning." It represents a major advance in probabilistic (ensemble) AI weather prediction.
GenCast is built on a conditional diffusion model adapted to the spherical geometry of Earth. The architecture comprises an encoder that converts atmospheric data onto a mesh grid, a processor based on a sparse transformer with sliding-window attention on the icosahedral mesh, and a decoder that maps back to grid-based variables. The processor consists of 16 transformer blocks with a feature dimension of 512 and 4-head self-attention, where each mesh node attends to all other nodes within its 16-hop neighborhood.
Starting from an initial noise sample, GenCast iteratively refines its predictions through the diffusion denoising process, conditioned on the two most recent atmospheric states. Each run of this process produces one ensemble member representing a plausible future weather trajectory. A full GenCast forecast comprises 50 or more ensemble members, each covering 15 days at 12-hour time steps.
GenCast was trained on 40 years of ERA5 data (1979 to 2018) at 0.25-degree resolution with 13 pressure levels using a 6-times-refined icosahedral mesh. Training took five days on 32 TPUs. At inference time, a single Google Cloud TPU v5 produces one 15-day forecast trajectory in approximately 8 minutes, and ensemble members can be generated in parallel.
In evaluation, GenCast outperformed ECMWF's ENS (the gold-standard operational ensemble system) on 97.4% of 1,320 verification targets. At lead times beyond 36 hours, this figure rose to 99.8%. GenCast also showed superior performance in predicting extreme weather events, including tropical cyclone tracks and extreme temperature and wind events, up to 15 days ahead.
A known limitation is that GenCast underpredicts the intensity of tropical cyclones, likely because the 0.25-degree training resolution does not fully resolve the inner core structure of these storms.
Aurora was developed by Microsoft Research and published in Nature in 2025 under the title "A foundation model for the Earth system." It distinguishes itself as the first large-scale foundation model for atmospheric prediction, applying the pre-train-then-fine-tune paradigm common in large language models to weather and climate.
The architecture is a flexible 3D Swin Transformer with Perceiver-based encoders and decoders. The encoder converts heterogeneous inputs (which may come from different data sources, resolutions, and variable sets) into a standard 3D representation of the atmosphere. The processor, a Vision Transformer, evolves this representation forward in time. The decoder translates the processed representation back into specific predictions. The model has 1.3 billion parameters, making it by far the largest AI weather model to date.
Aurora's foundation model approach involves two phases. In the pre-training phase, the model trains on more than one million hours of diverse geophysical data, including reanalysis products, forecast simulations, and climate model outputs. This gives Aurora a general understanding of atmospheric dynamics. In the fine-tuning phase, the model is adapted to specific tasks such as 10-day global weather forecasting at 0.25-degree resolution, 5-day air pollution prediction, or ocean wave forecasting.
When fine-tuned for medium-range weather forecasting, Aurora outperforms existing numerical and AI models across 91% of forecasting targets at 0.25-degree resolution. The model runs approximately 5,000 times faster than the ECMWF IFS, producing 10-day high-resolution weather forecasts in under a minute.
Microsoft has made Aurora's source code and model weights publicly available to the research community.
The following table summarizes the key characteristics of the major AI weather forecasting models.
| Model | Developer | Year | Publication | Architecture | Parameters | Resolution | Lead Time | Training Data | Inference Speed |
|---|---|---|---|---|---|---|---|---|---|
| FourCastNet | NVIDIA | 2022 | arXiv preprint | Vision Transformer with AFNO | Not disclosed | 0.25° | 7 days (6h steps) | ERA5 (1979-2015) | < 2 seconds for 7-day forecast (single GPU) |
| Pangu-Weather | Huawei | 2023 | Nature | 3D Earth-Specific Transformer | ~256M total (4 models) | 0.25° | 7 days | ERA5 (1979-2021) | ~10 seconds for 10-day forecast (single GPU) |
| GraphCast | DeepMind | 2023 | Science | Graph Neural Network (encoder-processor-decoder) | 36.7M | 0.25° | 10 days (6h steps) | ERA5 (1979-2017) | < 1 minute for 10-day forecast (single TPU) |
| NeuralGCM | Google Research | 2024 | Nature | Hybrid differentiable physics + neural network | Not disclosed | 0.7° / 1.4° | 10-15 days | ERA5 | ~100,000x faster than traditional climate models |
| GenCast | DeepMind | 2024 | Nature | Conditional diffusion model with sparse transformer | Not disclosed | 0.25° | 15 days (12h steps) | ERA5 (1979-2018) | ~8 minutes per ensemble member (single TPU v5) |
| Aurora | Microsoft | 2025 | Nature | 3D Swin Transformer with Perceiver encoders/decoders | 1.3B | 0.25° | 10 days | ERA5 + diverse geophysical data (1M+ hours) | ~5,000x faster than IFS |
The performance gains of AI weather models over traditional NWP have been documented across multiple independent evaluations. Several patterns emerge from these comparisons.
For standard verification metrics such as root mean square error (RMSE) and anomaly correlation coefficient (ACC), the leading AI models consistently outperform ECMWF HRES on the majority of variables and lead times. GraphCast exceeded HRES on over 90% of targets, while GenCast surpassed the full ECMWF ensemble (ENS) on 97.4% of targets. These results have been validated by ECMWF itself, which began hosting AI model forecasts on its website for real-time comparison.
Tropical cyclone track prediction has been a particular strength of AI models. GraphCast demonstrated earlier and more accurate landfall predictions for Hurricane Lee in 2023. ECMWF's own AIFS system improved tropical cyclone track forecasts by up to 20% compared to the physics-based IFS. Several AI models have shown skill in predicting other severe weather phenomena, including atmospheric rivers, extreme temperature events, and high wind episodes.
The speed advantage is perhaps the most striking difference. While a traditional NWP system requires hours of computation on a supercomputer with thousands of processors to produce a 10-day forecast, AI models can generate equivalent forecasts in seconds to minutes on a single GPU or TPU. This enables applications that were previously impractical, such as generating very large ensembles (hundreds or thousands of members) to better characterize forecast uncertainty, or running rapid forecast updates as new observations become available.
The most significant institutional validation of AI weather forecasting came from ECMWF itself. Recognizing the capabilities of data-driven models, ECMWF developed the Artificial Intelligence/Integrated Forecasting System (AIFS), a homage to its traditional IFS.
AIFS uses a graph neural network encoder and decoder combined with a sliding-window transformer processor. It is trained on a combination of ERA5 reanalysis and ECMWF's operational NWP analyses. The deterministic AIFS became operational on February 25, 2025, running alongside the physics-based IFS. The ensemble version of AIFS followed, becoming operational on July 1, 2025.
AIFS outperforms the physics-based IFS on many verification metrics, with notable improvements in tropical cyclone track forecasting (gains of up to 20%). It generates forecasts over 10 times faster than the physics-based system while consuming approximately 1,000 times less energy. An updated version, AIFS 1.1.0, was released on August 27, 2025, to address a precipitation forecast issue identified in the initial operational version.
The decision by ECMWF to make AI forecasts operational represents a watershed moment. For the first time, the world's leading weather prediction center is producing official forecasts using AI alongside its traditional physics-based system. This signals a fundamental shift in how the meteorological community views data-driven approaches.
Despite their impressive performance on standard benchmarks, AI weather models face several important limitations that the research community continues to address.
Multiple studies have found that AI models struggle with extreme weather events, particularly record-breaking conditions that are absent or underrepresented in the training data. A 2025 study demonstrated that numerical models consistently outperform AI models like GraphCast and Pangu-Weather for record-breaking heat, cold, and wind events. AI models tend to underestimate both the frequency and intensity of record-breaking events, with larger forecast errors for greater record exceedances. This is a fundamental challenge for data-driven approaches: by training on historical data, the models learn the statistical distribution of past weather, but they may not extrapolate well to unprecedented conditions.
Purely data-driven models do not explicitly enforce the physical laws that govern the atmosphere, such as conservation of mass, energy, and momentum. This can lead to forecasts that are statistically accurate on average but physically inconsistent in individual cases. For tropical cyclones, for example, the wind fields produced by AI models may not have the organized structure needed for accurate downstream predictions of storm surge, rainfall distribution, and wind damage. Hybrid approaches like NeuralGCM partially address this by retaining physics-based dynamics, but fully data-driven models remain vulnerable to this limitation.
Most AI weather models operate at 0.25-degree resolution (approximately 25 km), which is coarser than the 9 km resolution of ECMWF HRES. This limits their ability to resolve small-scale phenomena such as thunderstorms, sea breezes, and mountain-valley circulations. While the models may capture the large-scale environment conducive to these events, they cannot predict their precise location and timing in the way higher-resolution NWP models can.
AI weather models are trained on historical data from a climate that is continuously changing. As global temperatures rise and weather patterns shift, the statistical relationships learned from the past may become less reliable. This concern is particularly acute for "gray swan" events: weather extremes that are rare enough to be absent from the training dataset but may become more common under future climate conditions. Traditional NWP models, because they are grounded in physical principles, are in theory better equipped to handle novel atmospheric states.
Forecasting precipitation remains challenging for AI models. Rainfall is inherently more chaotic and localized than temperature or pressure fields, and its distribution depends on complex interactions between dynamics, thermodynamics, and microphysics. Several AI models show a tendency to produce overly smooth precipitation fields, underpredicting extreme rainfall while overestimating light precipitation. ECMWF's AIFS 1.1.0 update was specifically released to address a precipitation forecast issue in the initial version.
Operational meteorologists rely on their understanding of atmospheric physics to interpret and communicate forecasts. AI models operate as black boxes, making it difficult for forecasters to understand why a particular prediction was made or to identify when the model might be unreliable. Building trust in AI forecasts within the operational weather community requires not just demonstrated skill but also tools for interpretability, uncertainty communication, and failure mode analysis.
The development of AI weather forecasting has progressed at a remarkable pace:
| Year | Milestone |
|---|---|
| 2022 | NVIDIA publishes FourCastNet, demonstrating global AI weather prediction at 0.25° resolution using Adaptive Fourier Neural Operators |
| July 2023 | Huawei publishes Pangu-Weather in Nature, the first AI model to comprehensively outperform ECMWF HRES |
| November 2023 | DeepMind publishes GraphCast in Science, outperforming HRES on 90%+ of verification targets |
| 2023 | ECMWF begins hosting AI weather forecasts (including GraphCast and Pangu-Weather) on its website for public access |
| July 2024 | Google Research publishes NeuralGCM in Nature, demonstrating hybrid physics-AI models for both weather and climate |
| December 2024 | DeepMind publishes GenCast in Nature, the first AI model to outperform ECMWF ENS for probabilistic forecasting |
| February 2025 | ECMWF makes deterministic AIFS operational, running alongside the traditional IFS |
| 2025 | Microsoft publishes Aurora in Nature, introducing the foundation model paradigm to weather forecasting with 1.3 billion parameters |
| July 2025 | ECMWF makes ensemble AIFS operational |
Several active research directions are shaping the next generation of AI weather forecasting:
Higher resolution and regional models. Researchers are working on AI models that operate at resolutions of 5 km or finer, enabling prediction of convective storms and other mesoscale phenomena. Regional AI models tailored to specific geographic areas are also under development.
Extended-range and seasonal prediction. While current AI models focus primarily on the 1-to-15-day range, there is growing interest in applying similar techniques to sub-seasonal (2 to 6 weeks) and seasonal forecasting, where traditional models have limited skill.
Foundation models. Following Aurora's example, the trend toward large, pre-trained foundation models that can be fine-tuned for diverse tasks is expected to continue. These models could unify weather forecasting, air quality prediction, ocean modeling, and climate projection within a single framework.
Coupled Earth system modeling. Future AI models may jointly predict the atmosphere, ocean, land surface, and cryosphere, capturing the interactions between these components that are critical for extended-range prediction and climate applications.
Operational integration. As weather agencies worldwide adopt AI forecasts, significant work is needed on calibration, post-processing, uncertainty quantification, and forecaster tools to integrate AI predictions into existing operational workflows.
Physics-informed architectures. Approaches like NeuralGCM suggest that combining the strengths of physics-based modeling with data-driven learning may ultimately produce the most reliable forecasts, particularly for extreme events and climate projections.