Weather
Last reviewed
May 13, 2026
Sources
35 citations
Review status
Source-backed
Revision
v2 · 4,773 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 13, 2026
Sources
35 citations
Review status
Source-backed
Revision
v2 · 4,773 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Weather ChatGPT Plugins
AI in weather forecasting refers to the use of machine learning, and especially deep learning, to predict the state of the atmosphere. Between 2022 and 2024 a wave of neural network models trained on the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 reanalysis matched or beat the best physics-based forecasts on standard skill scores while running thousands of times faster on a single GPU. Major releases in that period include GraphCast from Google DeepMind, Pangu-Weather from Huawei Cloud, FourCastNet from NVIDIA, MetNet-3 from Google, and the operational Artificial Intelligence Forecasting System (AIFS) at ECMWF. The shift has been compared to the move from manual to numerical weather prediction in the 1950s, although the new models still depend on physics-based reanalyses for training data and on traditional numerical systems for initial conditions.
A neural weather model takes a recent snapshot of the atmosphere, encoded as gridded fields of temperature, wind, humidity, geopotential, and surface variables, and predicts those same fields some hours later. Successive predictions are chained to produce forecasts that run for days. The dominant architectures combine transformers, graph neural networks, and diffusion models. The training data is almost always ERA5, a 0.25 degree reanalysis maintained by ECMWF that fuses observations from satellites, radiosondes, aircraft, ships, and weather stations into a physically consistent record running back to 1940.
The public benchmark for global medium-range models is WeatherBench 2, maintained by Google Research and ECMWF. On that benchmark, several deep learning systems now outperform the gold standard operational physics model, ECMWF's High Resolution Forecast (HRES), on most surface and upper-air variables out to ten days. GenCast, the ensemble follow-up to GraphCast, also outperforms the ECMWF ensemble forecast (ENS) on the great majority of measured fields.
The payoffs are speed, energy, and cost. GraphCast produces a ten-day global forecast in roughly one minute on a single Tensor Processing Unit, against multiple hours on a 1,000-node supercomputer for HRES. ECMWF estimates that its AIFS uses around one thousand times less energy per forecast than the IFS. The trade-off is that the field is still working out how to handle extremes, how to keep forecasts physically self-consistent, and how to wean systems off ERA5.
Numerical weather prediction (NWP) began as an idea by the Norwegian physicist Vilhelm Bjerknes in 1904 and a first attempt by Lewis Fry Richardson in 1922 using hand calculation. Richardson's forecast was wildly off because of computational instabilities, but the underlying equations were sound. The first practical forecast came in 1950, when Jule Charney, Ragnar Fjørtoft, and John von Neumann used the ENIAC computer at the Aberdeen Proving Ground to produce a one-day prediction with a simplified barotropic model. Operational NWP began in 1954 with the Joint Numerical Weather Prediction Unit in the United States. From there the field grew through ever larger general circulation models running on increasingly powerful supercomputers, with ECMWF emerging as the leader in medium-range skill from the late 1970s onward.
From the 1970s, weather centres added statistical post-processing layers, most famously Model Output Statistics (MOS), to correct biases in NWP outputs against local observations. These methods are still widely used in operational forecasting. They were the first sustained application of data-driven techniques in weather, but they did not replace the physics core.
From about 2018, research groups began using convolutional neural networks and recurrent networks to forecast specific weather variables. Many of these systems focused on short-range precipitation prediction from radar data, a task known as nowcasting. The original WeatherBench dataset and benchmark, released by Stephan Rasp and colleagues in 2020, gave researchers a common testbed for global medium-range data-driven forecasts on regridded ERA5.
The inflection point came in 2022 and 2023. FourCastNet from NVIDIA, released in February 2022, was the first global neural model to match the ECMWF IFS at 0.25 degree resolution for many large-scale variables. Pangu-Weather from Huawei Cloud, published in Nature in July 2023, was the first system to beat the IFS on most upper-air variables. GraphCast from Google DeepMind, published in Science in November 2023, beat HRES on more than 90 percent of tested variables and lead times. By the end of 2024, ECMWF had committed to bringing its own AIFS into operations, and DeepMind had released GenCast, a generative ensemble model. Foundation-style models such as ClimaX, AtmoRep, and Microsoft's Aurora began to extend the same architectures to climate and earth-system tasks.
| Model | Lab | Year | Paper / venue | Architecture | Key feature |
|---|---|---|---|---|---|
| MetNet | 2020 | arXiv 2003.12140 (Sønderby et al.) | Axial self-attention CNN | 8-hour precipitation nowcasting at 1 km | |
| MetNet-2 | 2021 | Nature Communications 2022 (Espeholt et al.) | Dilated CNN | 12-hour precipitation forecasts beating HREF | |
| DGMR | Google DeepMind, Met Office | 2021 | Nature (Ravuri et al., September 2021) | Conditional GAN | 90-minute radar nowcasting preferred by 89 percent of meteorologists |
| FourCastNet | NVIDIA | 2022 | arXiv 2202.11214 (Pathak et al.) | Adaptive Fourier Neural Operator on a Vision Transformer | First global ML model matching IFS at 0.25 degrees |
| Pangu-Weather | Huawei Cloud | 2023 | Nature, July 2023 (Bi et al.) | 3D Earth-Specific Transformer | First model to beat IFS on most upper-air fields |
| GraphCast | Google DeepMind | 2023 | Science, November 2023 (Lam et al.) | Graph neural network on icosahedral mesh | 10-day forecast in under a minute, beat HRES on 90+ percent of variables |
| FuXi | Fudan University | 2023 | npj Climate and Atmospheric Science (Chen et al.) | Cascade of U-Transformers | Three submodels tuned for 0-5, 5-10, and 10-15 day ranges |
| NowcastNet | Tsinghua University | 2023 | Nature, July 2023 (Zhang et al.) | Physics-conditioned generative network | Skilful 3-hour nowcasting of extreme precipitation |
| MetNet-3 | Google Research, Google DeepMind | 2023 | arXiv 2306.06079 (Andrychowicz et al.) | U-Net transformer with sparse-observation conditioning | 24-hour forecasts at 1 to 4 km resolution, deployed in Google products |
| ClimaX | Microsoft Research, UCLA | 2023 | ICML 2023 (Nguyen et al.) | Vision Transformer | Foundation model pre-trained on CMIP6, fine-tuned across many weather and climate tasks |
| AtmoRep | Otto-von-Guericke University Magdeburg, Jülich | 2023 | arXiv 2308.13280 (Lessig et al.) | Stack of cross-attended transformers | Trained on 40 years of ERA5 as a stochastic atmospheric foundation model |
| SwinRDM | OPPO Research | 2023 | AAAI 2023 (Chen et al.) | SwinRNN coarse model with diffusion super-resolution | Pushed data-driven 0.25 degree forecasts past HRES |
| NeuralGCM | Google Research, ECMWF | 2024 | Nature, July 2024 (Kochkov et al.) | Differentiable JAX dynamical core plus neural physics | Hybrid model usable for weather and climate simulation |
| Aurora | Microsoft Research | 2024 | arXiv 2405.13063 (Bodnar et al.); Nature, May 2025 | 3D Swin Transformer encoder-decoder | Earth-system foundation model spanning weather, air quality, ocean waves, and cyclone tracks |
| Stormer | Argonne, UCLA | 2024 | ICLR 2024 workshop (Nguyen et al.) | Plain transformer with randomized lead times | Competitive on WeatherBench 2 with much less compute |
| GenCast | Google DeepMind | 2024 | Nature, December 2024 (Price et al.) | Diffusion-based ensemble | Beat ECMWF ENS on 97 percent of evaluated fields out to 15 days |
| AIFS Single | ECMWF | 2024 | ECMWF Newsletter 178 | Graph transformer | First fully operational ML system at a major weather centre, live February 2025 |
| AIFS ENS | ECMWF | 2025 | ECMWF Newsletter 185 | Diffusion-based ensemble of 51 members | Operational ensemble companion to AIFS Single, live July 2025 |
| WeatherNext 2 | Google DeepMind, Google Research | 2025 | Google blog, November 2025 | Hierarchical model | Successor to GraphCast, deployed across Google Search, Pixel Weather, and Maps |
GraphCast, published as Lam et al. in Science on 14 November 2023, represents the atmosphere on a multi-mesh icosahedral graph at 0.25 degree resolution. It predicts hundreds of weather variables ten days ahead at six-hour steps, producing the full forecast in under one minute on a single Cloud TPU v4 device. In a comparison against HRES, GraphCast was more accurate on 90.3 percent of 1,380 tested variable and lead-time combinations. The code and trained weights are released on GitHub under a permissive licence. ECMWF and several national weather services have integrated GraphCast experimentally into their forecasting workflows, and NOAA's experimental AI ensemble system uses GraphCast as a base.
Pangu-Weather (Bi et al., Nature, 5 July 2023) introduced the 3D Earth-Specific Transformer, a vision-transformer variant that encodes longitude, latitude, and pressure level together with an Earth-specific position bias. Four separate models predict at one-hour, three-hour, six-hour, and 24-hour increments, and the system uses a hierarchical temporal aggregation scheme to chain them with reduced error growth. The reference implementation completes a 24-hour global forecast in 1.4 seconds on an NVIDIA V100 GPU. Pangu was the first AI model invited to run on ECMWF's experimental dashboard for AI-based forecasts, and it has been particularly noted for accurate tropical cyclone track forecasts.
FourCastNet, by Jaideep Pathak and colleagues at NVIDIA, Lawrence Berkeley National Laboratory, Caltech, Rice, and Purdue, uses an Adaptive Fourier Neural Operator inside a Vision Transformer backbone. It matched the IFS at short lead times for large-scale variables in February 2022, and outperformed it on variables with complex fine-scale structure such as precipitation and surface wind. A scaled implementation generates a one-week global forecast in less than two seconds on a single GPU. FourCastNet is the basis for the Earth-2 platform NVIDIA released in 2024.
GenCast (Price et al., Nature, 4 December 2024) is a diffusion-based ensemble that produces 50 or more equally likely weather trajectories given an initial state. Across 1,320 evaluated combinations of variables and lead times, it was more skilful than ECMWF's ENS on 97 percent of them, and on tropical cyclone tracks it extended useful skill by about a day. GenCast runs each ensemble member in about eight minutes on a TPU v5, compared with several hours per ensemble member for ENS on a supercomputer. The model is available on GitHub and through Google's Vertex AI early-access programme.
NeuralGCM, by Dmitrii Kochkov and colleagues at Google Research and ECMWF, published in Nature on 22 July 2024, takes a hybrid approach. The dynamical core, written in JAX using spectral methods, integrates the primitive equations of atmospheric motion. A learned neural physics module replaces the hand-coded parameterizations that handle clouds, convection, radiation, and boundary-layer turbulence. Training optimises the coupled system end to end through many time steps, which has historically been the failure point of hybrid approaches. NeuralGCM produces deterministic and ensemble weather forecasts comparable to the best ML and physics methods at 2 to 15 days, and it can run multi-decade climate simulations at much lower compute cost than a conventional general circulation model.
Aurora, introduced in a May 2024 arXiv preprint by Cristian Bodnar and colleagues at Microsoft Research and published in Nature in May 2025, is a 1.3-billion-parameter foundation model trained on more than one million hours of geophysical data. The base architecture is a 3D Swin Transformer encoder-decoder. Fine-tuned versions handle high-resolution weather at 0.1 degrees, global atmospheric composition and air pollution, ten-day ocean wave forecasts, and tropical cyclone tracks. Microsoft reported a roughly 5,000-fold speed-up over the IFS for comparable forecasts.
ECMWF's Artificial Intelligence Forecasting System (AIFS) was first described in ECMWF Newsletter 178 in early 2024. AIFS Single, the deterministic version, became operational on 25 February 2025, the first fully open machine-learning model running in production at a major weather centre. AIFS ENS, an ensemble version with 51 members based on a diffusion architecture, followed on 1 July 2025. AIFS outperforms IFS by up to 20 percent on tropical cyclone tracks and uses roughly one thousand times less energy per forecast. AIFS is built on ECMWF's open-source Anemoi framework, which has been adopted by partners including the UK Met Office, Météo France, and DWD.
Nowcasting, the prediction of weather over the next zero to six hours, is one of the oldest applications of machine learning in meteorology. The task is dominated by precipitation, particularly convective storms that physics-based models struggle to spin up in time. Modern systems are usually trained on radar mosaics and satellite imagery rather than on reanalyses.
DGMR (Deep Generative Model of Rainfall), released by Google DeepMind in collaboration with the UK Met Office and University of Exeter in 2021, was a milestone. The Nature paper by Ravuri et al. described a conditional generative adversarial network producing realistic radar nowcasts out to 90 minutes over regions of about 1,500 by 1,300 km. In a head-to-head expert evaluation, professional meteorologists at the Met Office ranked DGMR ahead of alternative methods including PySTEPS in 89 percent of cases.
NowcastNet, published in Nature in July 2023 by Zhang et al. at Tsinghua University, combined a physics-informed evolution network with a generative refinement module. It produced sharp, physically plausible precipitation maps over 2,048 by 2,048 km regions up to three hours ahead, and a panel of 62 meteorologists rated it first in 71 percent of cases. The model is the first published deep learning system to consistently outperform leading physics-based nowcasting in extreme precipitation, which had been considered a hard limit for data-driven methods.
Google's MetNet line drove nowcasting into consumer products. MetNet, described by Sønderby et al. in 2020, used axial self-attention to forecast precipitation at 1 km resolution two minutes at a time for up to eight hours. MetNet-2, by Espeholt et al. in Nature Communications in 2022, extended the range to 12 hours and beat the operational HREF ensemble. MetNet-3 (Andrychowicz et al., 2023) widened the variable set to include surface temperature, wind, and dew point at lead times up to 24 hours, with 1 to 4 km spatial resolution and two-minute time steps. MetNet-3 powers the 12-hour precipitation Nowcast feature in the Pixel Weather app and the rain card in Google Search across the continental United States and parts of Europe.
Commercial nowcasting services include Tomorrow.io, which combines proprietary satellite data with deep learning, and Climavision, which augments coverage with privately operated radars. RainViewer offers a consumer radar app with AI-assisted short-term forecasts.
Google's Flood Hub, introduced in 2018 in India and expanded steadily since, uses a hydrologic model to forecast streamflow and an inundation model to translate that into flood maps. A May 2024 update extended Flood Hub to 80 countries with seven-day forecasts, up from a previous two-day horizon, covering more than 460 million people. Subsequent expansions pushed coverage past 100 countries and an estimated 700 million people. The system was the subject of a Nature paper by Nearing et al. in 2024 that documented skilful flood forecasts in ungauged basins worldwide.
Wildfire detection has shifted from sparse satellite snapshots to AI-fused multi-sensor pipelines. NOAA's GOES-R Advanced Baseline Imager already provides near-real-time detection of large fires across the Americas; the U.S. Forest Service uses these data, augmented with machine-learning post-processing, in active-fire dashboards. Google's FireSat, a satellite constellation built by Muon Space in partnership with Google Research and the Earth Fire Alliance, launched its first prototype satellite on the SpaceX Transporter 13 mission in March 2025 and targets 20-minute global revisit at high enough resolution to spot fires the size of a classroom. Google's wildfire boundary detection in Search and Maps, available across the United States, Australia, Canada, and Mexico, uses a similar convolutional model running on GOES-R imagery.
AI weather models have made some of their most visible gains on cyclones. Pangu-Weather was widely covered after it produced a more accurate track for Typhoon Doksuri in July 2023 than the operational physics models. GraphCast correctly predicted the Texas landfall of Hurricane Beryl in July 2024 around three days earlier than a top European model that pointed it toward Mexico. Aurora's tropical cyclone fine-tune outperformed the GFS and HWRF on track skill. DeepMind launched the Weather Lab portal in 2024 with an experimental cyclone model that the U.S. National Hurricane Center evaluated during the 2025 Atlantic season, the first time the NHC used an external AI model in its operational workflow.
Intensity prediction has lagged behind track prediction because the existing global models are tuned at 0.25 degree resolution, too coarse to resolve eyewall structure. Work to fix this includes specialised regional fine-tunes, hybrid models that hand off to high-resolution physics cores, and convolutional approaches such as the DeepTC model from a 2024 BAMS-affiliated paper, which reduced mean absolute error on 24-hour intensity forecasts by 8.9 to 10.2 percent against operational baselines.
Machine learning is also being applied to tornado detection from Doppler radar (NOAA's Warn-on-Forecast project), severe thunderstorm nowcasting (NVIDIA's StormCast, 2024), and heat-wave probability forecasting (Salient Predictions, ACE2 from Allen Institute for AI and the Berkeley group).
ECMWF has been the most aggressive operational adopter. Its AI dashboard, online since 2023, displays live forecasts from external models including GraphCast, Pangu-Weather, FourCastNet, FuXi, and the home-built AIFS. AIFS Single went into 24/7 operations on 25 February 2025 alongside the physics-based IFS. AIFS ENS followed in July 2025. ECMWF's Anemoi open-source training framework has become a shared substrate for several European national weather services building their own AI models.
The UK Met Office launched the AI for Numerical Weather Prediction (AI4NWP) programme in 2023 and announced a strategic partnership with The Alan Turing Institute in October 2023. The main joint project is FastNet, an experimental graph neural network targeting both global and high-resolution UK regional forecasts. The Met Office is also one of the partners on the EU-funded WeatherGenerator project to build a European foundation model of the Earth system.
NOAA's broader strategy is described in the NOAA AI Strategy and a series of associated implementation plans. The operational push came in 2024 and 2025 through Project EAGLE (Experimental AI Global and Limited-area Ensemble) at the Environmental Modeling Center, which produced the AIGFS (AI Global Forecast System) and AIGEFS (AI Global Ensemble Forecast System), both based on GraphCast and fine-tuned on NOAA's data. NOAA also runs HGEFS, a hybrid grand ensemble that combines AIGEFS with the existing physics-based GEFS, and which has consistently outperformed both components in testing. NOAA announced a 50 million dollar five-year investment in AI weather research in 2024.
The Korea Meteorological Administration runs Pangu-Weather operationally. The Japan Meteorological Agency, the Australian Bureau of Meteorology, and Germany's DWD have all integrated AI models into their development pipelines. China's national operational suite includes FuXi.
A growing ecosystem of private forecasters layers AI on top of public model output. Tomorrow.io launched Gale, an AI weather product, in 2024. Atmospheric G2 focuses on energy and infrastructure markets with high-resolution forecasts and derived products such as wind and solar generation estimates and natural gas demand. Salient Predictions, founded out of MIT and Woods Hole Oceanographic Institution, runs subseasonal-to-seasonal forecasts of two to 52 weeks ahead and released its GEM-2 framework in 2024 with decision-aligned diagnostics calibrated for tail risk. Excarta reports 13 percent lower 48-hour precipitation error and 7.5 percent lower wind error than the IFS on its core forecasts. IBM's Global High-Resolution Atmospheric Forecasting System (GRAF), launched in 2019 on a Power9 supercomputer, combined a 3 km variable-resolution physics core with ML-based refinements before the foundation-model era; it is run through The Weather Company.
Deep learning is no longer just an alternative to short-range forecasting; it is being applied across climate modelling. NeuralGCM is the most prominent example, with a Nature paper showing climate-scale simulations that track historical global mean temperature for multiple decades, and a follow-up in Science Advances in 2025 (Kochkov et al.) showing improved precipitation when the system is trained on satellite-based precipitation observations rather than ERA5 alone.
ClimaX, by Tung Nguyen and colleagues at Microsoft Research and UCLA (ICML 2023), pre-trains on heterogeneous CMIP6 climate simulations and is fine-tuned for downstream tasks ranging from global medium-range weather to regional downscaling. ACE2, from Allen Institute for AI together with the University of California, Berkeley, focuses on subseasonal-to-decadal variability and forced climate response, with results published in npj Climate and Atmospheric Science in 2025. AtmoRep aims to be a general-purpose foundation model trained on ERA5 from 1979 to 2017, with applications including downscaling, missing-data infilling, and ensemble generation.
The Climate Change AI community organises annual workshops at NeurIPS, ICML, and ICLR titled "Tackling Climate Change with Machine Learning". The workshops bring together climate scientists and ML researchers, with topics covering hybrid physics-ML climate models, dynamical downscaling, energy systems, and natural disasters. Stormer was first published at the ICLR 2024 edition.
WeatherBench 2 by Rasp et al., published in the Journal of Advances in Modeling Earth Systems in 2024, is the standard benchmark for global medium-range models. It provides ERA5 as ground truth, evaluation metrics following World Meteorological Organization standards, and reference scores for HRES, ENS, GraphCast, Pangu, FuXi, and other systems. Most major releases since 2023 report WeatherBench 2 results.
Nearly every current global ML weather model is trained on ERA5. That gives them a consistent, high-quality target but ties their accuracy ceiling to ECMWF's reanalysis. ERA5 is known to underestimate extreme precipitation and to drift in regions of sparse observation. Several papers note that models inherit ERA5's wet biases over the tropics and its smoothing of localized convective rainfall. Recent work, including the precipitation version of NeuralGCM, has begun retraining on IMERG and other observational datasets to break the dependence.
Neural models do not enforce conservation of mass, energy, or momentum. In rollouts beyond a few days, they can produce fields that are individually plausible but jointly inconsistent. Reported failure modes include unphysical smoothing of fronts at longer lead times, underprediction of cyclone intensity, and lack of cross-variable balance that would matter for downstream hydrologic or chemistry models. Hybrid systems like NeuralGCM and Aurora try to address this by retaining a physics core or applying physics-aware loss terms.
AI models trained with mean squared error losses on ERA5 tend to predict close to the climatological mean, which is fine for averages but bad for tails. GenCast and AIFS ENS were designed in part to address this through generative ensembles. NowcastNet and StormCast use physics-informed generative approaches for extreme precipitation. The question is open whether deep learning models can match physics on the most intense and least frequent events.
All major operational AI systems still rely on physics-based data assimilation to produce their initial conditions. AIFS Single is initialised from the ECMWF IFS analysis. AIGFS at NOAA is initialised from the GFS analysis. Without continued investment in observation networks and 4D-Var or ensemble Kalman assimilation, the supply of high-quality initial states would dry up. Work on end-to-end learned data assimilation, such as FuXi-En4DVar and a 2025 Nature Communications paper on a data-to-forecast ML system, is active but not yet operational.
WeatherBench 2 normalises the global comparison but it cannot capture every aspect of forecast quality. Important variables for users, such as freezing rain, microbursts, or coastal storm surge, are not in the benchmark. Skill scores against ERA5 reward fitting the same dataset the models were trained on. ECMWF and the U.S. National Weather Service both publish their own validation protocols, but standards for ML-specific evaluation, especially for extremes and for ensemble calibration, are still being defined.
The World Meteorological Organization and national weather services are working through how to incorporate ML forecasts into official warning processes. Issues include who owns liability when an AI miss leads to an unwarned event, how to disclose model uncertainty to the public, and how to maintain forecaster expertise when interpreting models whose inner workings are not transparent. ECMWF's decision to keep AIFS open-source and Hugging Face-hosted is partly motivated by these concerns.