A Graph Neural Network (GNN) is a class of neural_network designed to operate directly on data that is structured as a graph, meaning data made up of nodes (entities) connected by edges (relationships). Unlike standard feedforward networks that assume inputs are vectors, or convolutional networks that assume inputs sit on a regular grid such as an image, GNNs respect the irregular topology of graphs. They produce representations for individual nodes, for entire graphs, or for pairs of nodes by repeatedly exchanging information between connected vertices.
GNNs sit alongside convolutional and recurrent networks as one of the major families of deep_learning architectures, and they have become the standard tool for problems where relationships matter as much as features: predicting properties of molecules, ranking pins on Pinterest, estimating drive times on Google Maps, scoring candidate antibiotics, classifying jets at the Large Hadron Collider, and reasoning over knowledge bases. The field grew slowly through the 2000s, then exploded in 2017 after the Graph Convolutional Network (GCN) of Kipf and Welling demonstrated that a single, simple layer could outperform much more elaborate methods on standard benchmarks. Since then, dozens of architectures have been proposed, several have been deployed at web scale, and graph methods have begun to fuse with the transformer architecture that came to dominate language and vision.
Most interesting datasets are not flat tables. A social network is a graph of users connected by friendships. A road network is a graph of intersections connected by road segments. A molecule is a graph of atoms connected by bonds. The web is a graph of pages connected by links. Code is a graph of functions, classes, and call sites. Customer behavior on an e-commerce site is a bipartite graph of users and items. Even a sentence has a parse tree that is naturally a graph.
Classical machine learning struggles with this kind of input. A logistic regression on "the friends of a user" requires choosing some fixed number of friends, ordering them, and turning that ordering into features. A convolutional network needs a regular pixel grid. A recurrent network needs a linear sequence. None of these match the variable-size, permutation-invariant nature of graph data: the friends of a user have no natural order, and the model should give the same answer no matter how those friends are listed.
A GNN is built so the order of neighbors does not matter and the model can absorb a different number of them for every node. Each node starts with an initial feature vector, then exchanges messages with its neighbors over one or more rounds. After a few rounds, every node's vector contains information about its local neighborhood, and the network can use that vector for prediction. The same trick works for whole graphs: pool together every node vector to get a single graph-level vector, then feed it to a regression or classification head.
This simple idea covers a remarkable range of tasks. Node classification asks what label to assign each node, used for fraud detection or content moderation. Link prediction asks which pairs of nodes should be connected, used for recommendations and knowledge_graph completion. Graph classification asks what label to assign a whole graph, used for predicting whether a molecule binds a protein. Graph regression asks for a numeric output, used for estimating molecular energies or solubility. Edge prediction asks for properties of relationships, used for traffic forecasting on road networks.
The core operation in essentially every modern GNN is message passing. Imagine each node is a small computer that holds a state vector and can send and receive messages along its edges. In one round, every node sends a message based on its current state to each of its neighbors. Every node then collects the incoming messages, aggregates them into a single summary vector using a permutation-invariant function such as a sum or a mean, and combines that summary with its own previous state to produce its new state. After a few rounds, a node's state reflects information drawn from a wider and wider neighborhood.
The formal name for this template is the Message Passing Neural Network framework, introduced by Justin Gilmer and colleagues in 2017 as a unifying view of many earlier graph models. Each layer in an MPNN can be written as three small functions: a message function that turns the sender's features, the receiver's features, and the edge features into a message; an aggregation function (most often sum, mean, or max) that combines all incoming messages; and an update function that mixes the aggregated message with the receiver's prior state. Different choices for these three functions recover GCN, GraphSAGE, GAT, GIN, and most other named architectures.
Three properties make message passing the right inductive bias for graph data. First, it is permutation-equivariant: relabel the nodes and the outputs are relabeled in the same way, but no information changes. Second, it handles variable-size inputs: a node with 3 neighbors and a node with 30 neighbors run through the same code. Third, it is local: each layer only mixes information across one hop of the graph, so a model with k layers can only see k hops out, which makes it efficient and roughly translation-invariant on the graph.
The term "graph neural network" was coined by Marco Gori, Gabriele Monfardini, and Franco Scarselli in 2005, with the canonical paper by Scarselli, Gori, Tsoi, Hagenbuchner, and Monfardini appearing in IEEE Transactions on Neural Networks in 2009. That model treated each node as the state of a recurrent network and ran the recurrence to a fixed point. The math worked out, but training was awkward and the method did not scale. Around the same time Alessandro Sperduti and others were applying recursive neural networks to tree-structured data, an idea that anticipated the message passing view.
A second strand came from spectral graph theory. In 2013, Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun published "Spectral Networks and Locally Connected Networks on Graphs," which generalized convolution to graphs through the eigendecomposition of the graph Laplacian. The spectral construction was elegant but expensive: every forward pass required an eigendecomposition, and the learned filters did not transfer between graphs of different sizes. Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst later showed in 2016 that polynomial filters in the Laplacian (ChebNet) could be evaluated cheaply and were localized in space, which set the stage for the breakthrough that came next.
In September 2016, Thomas Kipf and Max Welling posted "Semi-Supervised Classification with Graph Convolutional Networks" on arXiv, later published at ICLR 2017. They simplified ChebNet by truncating the polynomial to first order and applying a normalization trick, leaving a single tidy layer that could be written as a sparse matrix product. On the standard citation network benchmarks (Cora, Citeseer, Pubmed) it beat every prior method by a wide margin while training in seconds. The paper was easy to read, the code was a few hundred lines of TensorFlow, and a permanent shift in the field followed. Within a year, GCN was the default baseline for any graph task, and a flood of follow-up papers tried to do better.
Three of those follow-ups defined the next generation of architectures. GraphSAGE, by William Hamilton, Rex Ying, and Jure Leskovec at NeurIPS 2017, fixed the main weakness of GCN: vanilla GCN is transductive, meaning it has to see the whole graph at training time. GraphSAGE instead learns an aggregator function that can be applied to any new node, which is essential for systems where the graph keeps growing. Graph Attention Networks, by Petar Veličković and collaborators at ICLR 2018, replaced the fixed averaging in GCN with learned attention weights, drawing on the same self-attention mechanism used in attention layers in transformers. Message Passing Neural Networks, by Justin Gilmer and colleagues at ICML 2017, gave the field a clean unifying framework and showed that a single MPNN could match expert-tuned models on quantum chemistry benchmarks.
By 2018, GNNs worked. The next question was: how powerful are they, exactly? Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka answered in "How Powerful are Graph Neural Networks?" at ICLR 2019. They showed that any message passing GNN is at most as expressive as the Weisfeiler-Lehman graph isomorphism test, a classical algorithm from the 1960s for telling whether two graphs are the same. They also designed an architecture, the Graph Isomorphism Network (GIN), that provably reaches this expressive limit. GIN became the standard test bed for research on expressive power and for benchmarks where structural information matters more than node features.
Alongside the theoretical work, surveys appeared that helped make sense of the explosion of architectures. The most-cited is Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip Yu's "A Comprehensive Survey on Graph Neural Networks," published in IEEE Transactions on Neural Networks and Learning Systems in 2021. It groups GNNs into recurrent, convolutional, autoencoder, and spatial-temporal families and remains a useful map of the territory.
The first widely publicized industrial deployment came from Pinterest. PinSAGE, by Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William Hamilton, and Jure Leskovec at KDD 2018, ran on a graph of 3 billion nodes (pins and boards) and 18 billion edges, training on 7.5 billion examples. It used random walks to sample a manageable neighborhood for each node and pushed offline retrieval quality and online A/B test metrics ahead of the existing system. Pinterest reported a 25% lift in impressions for its "Shop the Look" product after PinSAGE went live. Uber, TikTok, Alibaba, and several other companies followed with their own graph-based recommenders.
At about the same time, the recommender_system community found a more general lesson: when the input has heterogeneous entities and relationships (users, items, sessions, queries, content categories), a graph view tends to outperform flat embedding methods because it lets the model reuse information across paths.
Google Maps and DeepMind announced in 2020 that Google Maps' estimated time of arrival (ETA) predictions were now being produced by a GNN. The model represents a road network as a graph of "Supersegments" (chains of road segments that tend to flow together), conditions on real-time traffic, and predicts travel time for each chunk of a route. DeepMind reported up to a 50% reduction in ETA error in cities such as Berlin, Jakarta, Sao Paulo, Sydney, Tokyo, and Washington DC. The full model, including a description of the MetaGradients training trick used to stabilize it, was later published at CIKM 2021.
By 2021 it was clear that vanilla message passing has hard limits. It cannot solve some simple problems on long-range graphs because information has to be squeezed through bottlenecks, and stacking many layers tends to make every node's representation collapse to the same vector (the over-smoothing problem) or compresses too much into too small a vector (the over-squashing problem). Two responses became dominant.
The first is to drop strict locality and use a transformer directly on the graph, treating the structure as a kind of positional encoding. Graphormer, from Microsoft Research Asia, won the OGB-LSC quantum chemistry challenge at NeurIPS 2021 by feeding atom features into a standard transformer with three structural encodings: a centrality encoding (degree), a spatial encoding (shortest-path distance), and an edge encoding (bond features along the path). It also took the Direct Track of the Open Catalyst Challenge, beating handcrafted GNNs by a wide margin. Graphormer turned the conventional wisdom on its head: where GCN had said "locality is the right inductive bias," Graphormer showed that for many graph-level tasks a global attention model with the right encodings is better. Other graph transformers (GraphGPS, GROVER, SAN, EGT) followed.
The second response is to build in the symmetries that physical problems require. Molecules and crystals look the same after rotation or translation, so a model that predicts their energy should give the same answer no matter how the input coordinates are oriented. E(n)-Equivariant Graph Neural Networks, by Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling at ICML 2021, build this into the architecture by treating coordinates and features separately and only mixing coordinates through equivariant operations. Later models, especially EquiformerV2 by Yi-Lun Liao and colleagues at ICLR 2024, combine the equivariance idea with attention and have set the state of the art on Open Catalyst leaderboards for predicting properties of catalytic surfaces.
A third trend is the early appearance of graph foundation models. The hope is that a single model pretrained on a large mixture of graphs (citation networks, social networks, molecular graphs, knowledge graphs) can be fine-tuned or prompted to solve new graph tasks. GraphGPT (SIGIR 2024), OpenGraph (EMNLP 2024), and GFT (NeurIPS 2024) are early examples, but compared to the runaway success of foundation models in language, the graph version is still in its early stages. A core difficulty is that there is no "natural language of graphs" that is shared across domains; molecules, citations, and friendships do not look alike, and what counts as a useful pretraining objective for one rarely transfers to another.
| Architecture | Year | Authors | Key idea |
|---|---|---|---|
| Original GNN | 2005 / 2009 | Scarselli, Gori, Tsoi, Hagenbuchner, Monfardini | Recurrent updates run to a fixed point at each node |
| Spectral CNN | 2013 | Bruna, Zaremba, Szlam, LeCun | Convolution defined through the graph Laplacian eigenbasis |
| ChebNet | 2016 | Defferrard, Bresson, Vandergheynst | Localized polynomial filters in the Laplacian |
| GCN | 2017 | Kipf, Welling | First-order spectral approximation, simple sparse matrix multiply |
| MPNN | 2017 | Gilmer, Schoenholz, Riley, Vinyals, Dahl | Unifying message-aggregate-update framework |
| GraphSAGE | 2017 | Hamilton, Ying, Leskovec | Inductive aggregator that generalizes to unseen nodes |
| GAT | 2018 | Velickovic, Cucurull, Casanova, Romero, Lio, Bengio | Self-attention to weight neighbor messages |
| PinSAGE | 2018 | Ying, He, Chen, Eksombatchai, Hamilton, Leskovec | Random-walk sampling for web-scale graphs |
| GIN | 2019 | Xu, Hu, Leskovec, Jegelka | Sum aggregator that matches the Weisfeiler-Lehman test |
| EGNN | 2021 | Satorras, Hoogeboom, Welling | E(n) equivariance through coordinate-aware updates |
| Graphormer | 2021 | Ying, Cai, Luo, Zheng, Ke, He, Shen, Liu | Transformer on graphs with structural encodings |
| EquiformerV2 | 2024 | Liao, Wood, Das, Smidt | Equivariant transformer with SO(2) convolutions |
| Domain | Example system | What the GNN does |
|---|---|---|
| Materials discovery | DeepMind's GNoME | Predicts crystal stability; flagged 2.2 million candidate materials in Nature 2023 |
| Catalyst design | Open Catalyst Project | Estimates relaxation energies for catalyst surfaces using EquiformerV2 |
| Drug discovery | MIT antibiotic search (halicin, abaucin) | Screens millions of molecules for activity against resistant bacteria |
| Protein structure | AlphaFold (uses graph features) | Models residue interactions through equivariant attention; see alphafold |
| Recommender systems | Pinterest PinSAGE | Generates pin embeddings for retrieval and ranking |
| Recommender systems | Uber UberEats | Suggests restaurants and dishes through user-item graph |
| Mapping | Google Maps ETA | Predicts travel time on road graphs; reduced ETA error by up to 50% |
| Knowledge graphs | Google, Amazon, Microsoft | Answers entity queries and completes missing facts |
| Social networks | Twitter (X), TikTok | Ranks recommendations; detects fake or coordinated accounts |
| Particle physics | ATLAS at CERN | Tags jets by flavor (b, c, light); identifies tracks at LHC |
| Code analysis | DeepMind, Microsoft | Reasons over abstract syntax trees and call graphs |
| Traffic, weather | DeepMind, Google | Predicts spatiotemporal patterns on sensor networks |
Molecules are the showpiece domain for GNNs. Atoms are nodes, bonds are edges, and the property to predict (energy, solubility, binding affinity, toxicity) is a function of the whole graph. The first MPNN paper itself targeted the QM9 dataset of 134,000 small molecules and matched or beat hand-tuned chemistry baselines.
The most prominent recent result is DeepMind's Graph Networks for Materials Exploration (GNoME), published in Nature in November 2023. GNoME is an active learning loop: a GNN proposes candidate crystal structures, density functional theory simulations score the most promising ones, and the labeled results feed back into the next round of training. After scaling to hundreds of thousands of training examples, the system flagged 2.2 million candidate stable materials, of which 380,000 are estimated to be the most stable and have been added to the Materials Project database. External labs experimentally synthesized 736 of these materials in concurrent work, validating a non-trivial fraction of the predictions. The result is contested: independent reviewers from UC Berkeley argued in 2024 that many of the proposed structures are minor variations of known materials, and the practical chemistry value of the discoveries is still being worked out. But the demonstration that a GNN can guide computational screening at this scale is genuine.
On catalyst design, Meta AI's Open Catalyst Project (OC20, OC22, ODAC23) provides datasets of millions of catalyst surfaces with computed energies, and the leaderboard is now dominated by equivariant graph transformers, with EquiformerV2 the current top model.
In drug discovery, the most discussed GNN result is the antibiotic screening pipeline from MIT, which trained a GNN on a few thousand molecules with known antibacterial activity, then screened a library of more than 100 million compounds. The screen surfaced halicin (announced in Cell, 2020) and abaucin (announced in Nature Chemical Biology, 2023), two structurally novel antibiotics with activity against drug-resistant strains. These are early-stage compounds, not approved drugs, but they were genuinely picked by a GNN out of a much larger search space than humans could examine.
Industrial pipelines that use GNNs include those at Recursion (lead optimization, target identification), Insitro (phenotype prediction from chemical structure), Atomwise (binding affinity), and most large pharma companies' AI groups. The 2025 "Graph Neural Networks in Modern AI-Aided Drug Discovery" review in Chemical Reviews surveys hundreds of recent applications.
Recommendation is the largest commercial deployment of GNNs by user count. The basic graph is a user-item bipartite graph, often enriched with category nodes, social edges, session edges, and content features. PinSAGE at Pinterest pioneered the web-scale style: rather than convolve over the whole graph, sample a small random-walk-based neighborhood for each query and convolve over that. Uber's UberEats system, TikTok's recommendation graph, Alibaba's e-commerce graph, and Snap's content graph all use variants of this approach. The pattern is consistent: a GNN-based candidate generator feeds a downstream ranking model, with the GNN responsible for finding a few hundred good candidates from a catalog of billions.
A knowledge_graph is a structured database of entities and the relationships between them. The classic task is knowledge graph completion: predicting whether a missing edge between two entities should exist ("does this drug interact with this protein?"). Three families of models compete here. Translational models such as TransE (Bordes et al. 2013) treat each relation as a translation between entity vectors. Bilinear models such as DistMult (Yang et al. 2015) and ComplEx (Trouillon et al. 2016) score triples through a tensor decomposition. GNN-based models such as R-GCN (Schlichtkrull et al. 2018) and CompGCN (Vashishth et al. 2020) extend message passing to handle different relation types. Most production knowledge graphs at Google, Amazon, and Microsoft now combine all three families.
The Large Hadron Collider at CERN produces collision data that is naturally a graph: each event has a few hundred reconstructed particles, and the geometric and physical relationships between them carry the signal. Graph networks have replaced earlier hand-engineered classifiers for several tasks, especially jet tagging (deciding what kind of particle started a jet) and track reconstruction. ATLAS's GN1 tagger, built on a graph network operating over a jet's tracks, sets the state of the art for identifying b-quark and c-quark jets, an important step in searches for the Higgs boson and new physics. Because the LHC trigger system has microsecond latency budgets, several groups have built FPGA implementations of small GNNs that can run in real time.
Code is graph-shaped data: an abstract syntax tree, a control flow graph, a call graph, or a data dependency graph. Early work by Allamanis and colleagues at Microsoft (2018) used a gated graph network to predict variable misuse and find subtle bugs. More recent systems combine graph features with transformer language models trained on code, including CodeBERT and CodeT5, both of which have GNN ancestors. The graph view is most useful where pure sequence models miss long-range structural facts, for example when reasoning about whether a function is called in any reachable code path.
A road network is a graph. So is a sensor grid. So is a forecast region tiled into cells with edges between neighbors. Spatial-temporal GNNs combine graph layers with sequence models to predict things that vary across both space and time, including traffic flow (Google Maps, Baidu Maps, Uber routing), weather forecasting (DeepMind's GraphCast, which beat traditional numerical weather prediction baselines on a 10-day forecast horizon in 2023), and air quality (used by several Chinese provincial environment agencies).
Three libraries dominate practical GNN work in 2026.
PyTorch Geometric (PyG), maintained at TU Dortmund and now by a broader community, is the most popular general-purpose framework. It is built on top of pytorch, exposes most published architectures as ready-to-use layers, and ships with hundreds of standard datasets and benchmarks. As of 2026 it has more than 22,000 GitHub stars and is the default tool taught in most graduate courses on graph learning.
Deep Graph Library (DGL), originated at NYU and Amazon, takes a slightly different approach: it is backend-agnostic and can run on PyTorch, TensorFlow, or MXNet, though the PyTorch backend is now the most heavily used. DGL was the framework used to train the SE(3)-Transformer that influenced AlphaFold's architecture, and it powers Amazon's recommendation graphs.
JAX has produced a smaller but growing ecosystem. Jraph, a jax-based GNN library released by DeepMind in 2020, is the spiritual successor to the GraphNets library used by DeepMind for traffic prediction at Google Maps and for several internal models. The appeal of JAX-based libraries is that they parallelize cleanly across TPUs, which matters for foundation-model-scale training.
Other tools fill specialized niches. NVIDIA's cuGraph and DGL-NV support multi-GPU training on graphs with billions of edges. The Open Graph Benchmark (OGB), maintained by the Stanford SNAP group, provides standardized large-scale datasets and a public leaderboard. PyTorch Geometric Temporal extends PyG to spatiotemporal graphs.
If you stack too many message passing layers, every node's representation tends to converge to the same vector. The classical analysis treats one round of GCN as a kind of graph diffusion, and after many rounds diffusion smears every initial signal into a single average. Empirically, performance on node classification tends to peak around 2 to 4 layers and decline beyond that, which makes it hard to capture long-range structure. Many proposed fixes exist (DropEdge, PairNorm, residual connections, jumping knowledge connections, GCNII), and they help, but no method has fully closed the gap with depth.
A related but distinct problem: when a node needs to gather information from many distant nodes, all of that information has to be compressed into a single fixed-size vector at each intermediate hop, which limits how much the receiver can actually learn. Alon and Yahav (2021) gave a clear analysis of the bottleneck and suggested graph rewiring as a fix; later work has linked over-squashing to the spectral gap of the graph Laplacian and shown that there is an inherent trade-off with over-smoothing. Recent papers in 2023 and 2024 connect both problems to vanishing gradients and recurrent learning, suggesting they are different faces of the same depth issue.
Message passing GNNs are no more powerful than the 1-Weisfeiler-Lehman test, which means they cannot distinguish certain pairs of graphs that look very similar locally but differ globally. This shows up in practice on tasks such as counting cycles and detecting motifs. Several extensions go beyond this limit (k-WL GNNs, subgraph GNNs, position-aware GNNs, equivariant GNNs in higher dimensions), at the cost of more compute. There is no consensus on which extension is the right default.
Most research GNN benchmarks are static graphs with one type of node and one type of edge. Real systems are not. Knowledge graphs have hundreds of relation types. Social networks change every minute. E-commerce graphs add new items every day. Heterogeneous GNNs (R-GCN, HAN, HGT) and dynamic GNNs (TGN, JODIE, ROLAND, DyREP) try to handle these settings, but they involve many more design choices than static GNNs and the benchmarks are less mature.
A graph with a billion nodes does not fit on a single GPU. Sampling methods (GraphSAGE, PinSAGE, ClusterGCN, GraphSAINT) make large-scale training possible by training on small subgraphs, but each sampling strategy biases the gradients in some way. Truly distributed graph training, where the graph is partitioned across a cluster, is a research area in 2026 but not yet a solved problem.
Four broad lines of work are active in 2026.
Foundation models for graphs. Can we pretrain on a mixture of graph datasets and transfer to new graphs, the way LLMs transfer between text tasks? Early signs are mixed. Cross-domain transfer (citation networks to social networks) is hard because what counts as a useful node feature differs. Within a single domain, especially molecules, transfer works: pretraining on millions of unlabeled molecules and fine-tuning on a small labeled set is now standard practice.
Geometric and equivariant deep learning. A growing community treats GNNs as one slice of a larger "geometric deep learning" framework that also covers spheres, manifolds, and gauge fields. The textbook by Bronstein, Bruna, Cohen, and Velickovic (2021) lays out this view. The practical payoff so far is in physics and chemistry, where equivariant models such as EquiformerV2 are clearly better than non-equivariant ones, but the framework also gives a unified language for understanding graph attention, transformer positional encoding, and convolutional networks.
Graph-language models. What if the graph is just one input modality alongside text? GraphGPT, InstructGraph, OpenGraph, and several other systems try to align graph encoders with the embedding space of an LLM, so the LLM can answer queries that depend on graph structure ("what are this molecule's likely side effects?" or "summarize the citation network around this paper"). The early systems are promising but small.
Reasoning and algorithmic generalization. Petar Velickovic and collaborators have been pushing GNNs to learn classical algorithms (Dijkstra, Bellman-Ford, BFS) so that the same architecture can both reason over a graph and execute a graph algorithm at inference time. Their CLRS benchmark, released in 2022, tests 30 algorithms and has become a standard test bed for algorithmic reasoning research.
Most interesting structure in the world is relational. Drug action depends on how atoms are bonded, not just which atoms are present. A user's interests depend on what their friends look at, not just what they themselves clicked. Traffic at one intersection depends on traffic at upstream intersections. Particle interactions depend on which particles came from a common parent. A general-purpose machine learning toolkit needs an answer to "what do I do when my data is a graph," and the answer that has emerged over the past decade is: use a graph neural network.
The field is no longer in its breakthrough phase. The breakthroughs (GCN, GAT, MPNN, GIN, PinSAGE, Graphormer, GNoME) are mostly in the past, and the open problems (over-smoothing, scaling, foundation models, heterogeneous and dynamic graphs) are clear. What is happening now is consolidation: GNNs are being absorbed into mainstream pipelines for chemistry, search, recommendation, traffic, and science. They are no longer exotic; they are infrastructure.