Graph Machine Learning Models

AI Models Machine Learning Model Architecture

31 min read

Updated May 31, 2026

Suggest edit History Talk

RawGraph

Last edited

May 31, 2026

Fact-checked

In review queue

Sources

20 citations

Revision

v3 · 6,253 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Graph machine learning models are neural networks designed to operate on data structured as graphs, where the input is a set of nodes connected by edges rather than a grid like an image or a sequence like text. The defining property of these models is permutation equivariance: the output for a node should not change if the graph is relabeled, which rules out treating a graph as a flat vector and motivates the message passing computation used by most modern graph neural networks (GNNs).¹ As a branch of machine learning and deep learning specialized to relational data, the field has, since the introduction of the Graph Convolutional Network (GCN) by Thomas Kipf and Max Welling in 2017, expanded into hundreds of architectures, supports several large open-source libraries, and powers production systems at Google, Pinterest, Amazon, Google DeepMind, and Microsoft.

Unlike convolutional or recurrent networks, GNNs do not assume a fixed neighborhood or a fixed input length. Each node updates its representation by aggregating signals from its neighbors and then applying a learnable transformation. The same parameters are shared across every node and every edge, which makes the model size independent of graph size and lets a model trained on small graphs generalize to larger ones. The graphs themselves can be homogeneous (one node type, one edge type), heterogeneous (multiple types, as in knowledge graphs), directed or undirected, weighted or unweighted, and static or evolving in time.

Overview

Graph machine learning addresses five canonical tasks. The choice of task determines the loss function, the readout, and the evaluation metric.

Task	Goal	Example	Typical readout
Node classification	Predict a label for each node	Citation network paper category	Per-node softmax
Link prediction	Predict whether two nodes are connected	Recommending a friend on a social network	Dot product or bilinear score
Graph classification	Predict a label for the whole graph	Molecule toxicity	Sum or mean pooling of node embeddings
Graph regression	Predict a continuous value for a graph	Molecular property like HOMO LUMO gap	Pooled embedding plus MLP
Graph generation	Sample new graphs from a distribution	De novo drug design	Autoregressive or diffusion decoder

Several other settings build on these primitives. Community detection partitions nodes into clusters using embeddings from an unsupervised GNN. Subgraph matching identifies whether a small motif occurs inside a larger graph. Combinatorial optimization (traveling salesman, maximum independent set) has been attacked with GNN policies trained via reinforcement learning.

Inputs can carry features on nodes, edges, or both. A molecule has atom features (element, charge) on nodes and bond features (single, double, aromatic) on edges. A road network has road type and length on edges and intersection coordinates on nodes. Modern GNN libraries treat these features uniformly through a small set of message and update functions.

History

The idea of computing on graphs with neural networks predates the deep learning era. Marco Gori and Franco Scarselli proposed the original "graph neural network" in 2005 and 2009 as a recurrent fixed point system. It was hard to train and never became widely used. The current wave of GNNs grew from two lines of work that converged around 2016 and 2017.

The first line was shallow node embedding, which treats each node as a token and learns a low-dimensional vector for it without parametric message passing. DeepWalk by Bryan Perozzi, Rami Al-Rfou, and Steven Skiena, presented at KDD 2014, generated random walks on a graph and ran the word2vec skip-gram model on the resulting sequences. LINE by Jian Tang and collaborators at WWW 2015 preserved first-order and second-order proximity through an explicit objective. node2vec, by Aditya Grover and Jure Leskovec at KDD 2016, extended DeepWalk with biased random walks controlled by two parameters that interpolate between breadth-first and depth-first exploration. These models scaled to millions of nodes but produced a fixed embedding table that could not handle new nodes or edge features.

The second line was spectral graph convolution. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun proposed in 2014 to define convolution on a graph through the eigendecomposition of the graph Laplacian. The cost was cubic in the number of nodes, and filters were not localized. Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst introduced ChebNet at NeurIPS 2016, which approximated spectral filters with Chebyshev polynomials of the Laplacian and reduced cost to linear in the number of edges. ChebNet was the immediate precursor to GCN.

The Graph Convolutional Network (GCN), introduced by Kipf and Welling at ICLR 2017, simplified ChebNet to a single-hop filter with a renormalization trick and showed strong results on the Cora, CiteSeer, and PubMed citation benchmarks. Within months the field exploded. GraphSAGE by Will Hamilton, Rex Ying, and Jure Leskovec at NeurIPS 2017 introduced inductive learning through neighbor sampling. Graph Attention Network (GAT) by Petar Veličković and collaborators at ICLR 2018 added learnable attention weights. Justin Gilmer and colleagues at ICML 2017 unified existing models under the Message Passing Neural Network (MPNN) framework.

Core GNN architectures

Most convolutional GNNs follow the same two step recipe. For each node, aggregate a function of the neighbor features and the connecting edge features, then update the node feature using the aggregated message and the previous state. Models differ in the aggregator (sum, mean, max, attention) and the update (linear, MLP, gated recurrent unit). The table below lists the most cited architectures.

Architecture	Year	Authors	Aggregator	Notable property
GCN	2017	Kipf, Welling	Normalized sum with degree	Simplest spectral approximation, transductive
GraphSAGE	2017	Hamilton, Ying, Leskovec	Mean, max, or LSTM over sampled neighbors	First inductive GNN at scale
GAT	2018	Veličković et al.	Multi-head attention	Learnable neighbor weighting
MPNN	2017	Gilmer et al.	General message function plus update	Unifying framework for chemistry GNNs
GIN	2019	Xu, Hu, Leskovec, Jegelka	Sum after MLP	Provably as expressive as the 1-Weisfeiler-Lehman test
R-GCN	2017	Schlichtkrull et al.	Per-relation linear sum	First strong GNN for knowledge graphs
HAN	2019	Wang et al.	Meta-path attention	Heterogeneous node and edge types
Cluster-GCN	2019	Chiang et al.	Mini-batch within graph partitions	Scaling GCN to 100M edges
GraphSAINT	2020	Zeng et al.	Subgraph sampling with normalization	Unbiased mini-batches for large graphs
SIGN	2020	Frasca et al.	Precomputed multi-hop diffusion	Single SGD step over a feature MLP, very fast
PNA	2020	Corso et al.	Multiple aggregators with degree scalers	Strong on regular graphs
DiffPool	2018	Ying et al.	Learnable hierarchical pooling	First differentiable graph pooling

GCN. The forward pass for a single layer computes a normalized adjacency multiplication: each node averages the features of its neighbors and itself, weighted by the inverse square root of the product of their degrees. A learnable linear map and a nonlinearity follow. Two layers of GCN cover the two-hop neighborhood of each node and reach state of the art on small citation graphs. GCN is transductive in its original formulation: the renormalized adjacency matrix is computed on the whole graph at training time, so adding new nodes later requires recomputing the matrix.

GraphSAGE. GraphSAGE solves the inductive problem by sampling a fixed number of neighbors for each node and applying a permutation invariant aggregator (mean, pool, or LSTM) over the sample. Because no full adjacency multiplication is required, GraphSAGE can produce embeddings for nodes that did not exist at training time, which is what made it production-ready for recommender systems.

GAT. GAT replaces the fixed adjacency weights with learned attention coefficients.² For each edge, a shared linear map produces scores that are normalized over the neighborhood by a softmax. The model uses multiple attention heads in parallel, similar to the Transformer block, and concatenates or averages their outputs. The attention weights are interpretable in some cases and make the model robust to noisy edges. Shaked Brody, Uri Alon, and Eran Yahav showed at ICLR 2022 that the original GAT computes only static attention, in which the ranking of neighbors is fixed regardless of the query node; their GATv2 moves the nonlinearity after the concatenation to recover dynamic attention as a drop-in replacement with the same parameter count, and it is now the default GAT variant in the major libraries.³

MPNN. Gilmer and colleagues proposed MPNN as a unifying notation: each layer computes a message for each edge using the source feature, target feature, and edge feature, sums the messages into each target node, and then updates the target node with a recurrent or feed-forward function. Almost every later architecture can be written in MPNN form, and most graph libraries expose an MPNN class as the base abstraction.

GIN. Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka analyzed GNN expressivity at ICLR 2019. They proved that any sum-based aggregator followed by an MLP is at most as powerful as the 1-Weisfeiler-Lehman (1-WL) graph isomorphism test, that mean and max aggregators are strictly weaker, and that an injective sum aggregator (Graph Isomorphism Network or GIN) achieves the 1-WL bound. GIN became the standard benchmark architecture for graph classification because its theoretical properties match its empirical performance on TU datasets.⁴

R-GCN and HAN. Real graphs often have typed nodes and edges. R-GCN, by Michael Schlichtkrull and collaborators in 2017, assigns a separate learnable weight matrix to each relation type and sums the per-relation messages. Basis decomposition or block diagonal decomposition controls the parameter count when the number of relations is large. HAN (Heterogeneous Attention Network) by Xiao Wang and colleagues at WWW 2019 generalizes attention along meta-paths, fixed sequences of relations that capture semantic patterns in a heterogeneous graph.

Scalability variants. Standard message passing requires the full neighbor set of each node at every layer, which blows up memory when the graph has hundreds of millions of edges. GraphSAINT samples a connected subgraph at each iteration and applies normalization to correct sampling bias. Cluster-GCN partitions the graph with a graph clustering algorithm (METIS) and performs mini-batch training within each partition. SIGN precomputes multi-hop diffused features once and trains a feed-forward MLP on the result, trading expressivity for raw throughput.

Transformer based and hybrid graph models

Graph transformers apply the self-attention mechanism to all node pairs, not only to neighbors. They lose the inductive bias of locality but gain global receptive field, which helps on tasks where long-range information matters (predicting properties of polymers, reading long chains in a parse tree). The challenge is how to inject the graph structure since vanilla self-attention is permutation invariant and would treat the graph as a bag of nodes.

Vijay Prakash Dwivedi and Xavier Bresson proposed the Graph Transformer in 2020, which adds Laplacian eigenvectors as positional encodings so the attention layer can distinguish nodes by their structural role. The Spectral Attention Network (SAN) by Devin Kreuzer and colleagues at NeurIPS 2021 extends this with learned positional encodings derived from the full eigendecomposition.

Graphormer, introduced by Chengxuan Ying, Tianle Cai, and collaborators at NeurIPS 2021, encodes graph structure through three biases added to the attention logits: a centrality encoding for each node based on its degree, a spatial encoding based on the shortest path distance between nodes, and an edge encoding aggregated along the shortest path.⁵ A Microsoft Research Asia team built on Graphormer to win the graph-prediction track of the OGB Large Scale Challenge (OGB-LSC) on the PCQM4M quantum chemistry dataset at KDD Cup 2021, beating every message passing baseline by a clear margin.⁶

TokenGT (Tokenized Graph Transformer), proposed by Jinwoo Kim, Tien Dat Nguyen, and collaborators at NeurIPS 2022, took the opposite design stance: it feeds every node and every edge into a standard, unmodified Transformer as independent tokens, augmented only with orthonormal node identifiers and type embeddings. The authors proved that with these token embeddings a plain Transformer is at least as expressive as a second-order invariant graph network (2-IGN), and therefore strictly more expressive than any message passing GNN, while reaching competitive accuracy on PCQM4Mv2. TokenGT is often cited as evidence that graph-specific architecture is not strictly necessary if structure is encoded in the input tokens.⁷

GraphGPS by Ladislav Rampášek, Mikhail Galkin, and Dominique Beaini at NeurIPS 2022 proposed a recipe for hybrid models. Each block contains a local message passing layer in parallel with a global attention layer, with both fed by learned positional and structural encodings.⁸ GraphGPS gave consistent gains across the Long Range Graph Benchmark and inspired a wave of hybrid architectures.

Exphormer by Hamed Shirzad and collaborators at ICML 2023 reduced the quadratic attention cost using expander graphs as a sparse global connectivity pattern, making global attention tractable on graphs with tens of thousands of nodes. NAGphormer by Jinwoo Kim and colleagues reformulated graph attention as a sequence problem over hop counts, allowing the use of standard transformer libraries.

The most recent line of work applies state space models to graphs. GraphMamba by Chloe Wang and colleagues in 2024 adapts the Mamba selective state space layer to graph data by selecting node sequences with structural relevance to the target node. Initial results suggest sub-quadratic global mixing with performance competitive with GraphGPS on long range tasks. Several follow-ups including Graph-Mamba and GMN have explored alternative scan orders such as breadth-first traversal and random walks.

Inspired by large language models, a parallel effort aims at graph foundation models: a single model pretrained on many graphs and transferred to new datasets and tasks by fine-tuning, in-context learning, or zero-shot inference. Surveys distinguish universal, domain-specific, and task-specific foundation models, and approaches such as GIT (Graph Generality Identifier on Task-Trees, 2024) report transfer across dozens of graphs in several domains. Whether a single architecture can generalize across graphs whose nodes and edges represent very different objects remains an open research question as of 2026.⁹

Equivariant networks for chemistry and materials

Molecules and crystals carry geometry: each atom has a 3D position, and the physical properties of the system are invariant under translation, rotation, and reflection of the whole structure. Plain GNNs that use only the graph topology lose this information. Equivariant graph neural networks preserve it by ensuring that if the input coordinates rotate, the output rotates in the same way.

The pioneer was SchNet by Kristof Schütt, Pieter-Jan Kindermans, and collaborators at NeurIPS 2017. SchNet uses continuous-filter convolutions parameterized by the interatomic distance, which makes the energy prediction translation, rotation, and permutation invariant. SchNet trained on the QM9 dataset reached chemical accuracy on several molecular properties for the first time with a neural network.

DimeNet by Johannes Gasteiger, Janek Groß, and Stephan Günnemann at ICLR 2020 added directional information through messages that depend on bond angles, not only on distances. DimeNet++ improved the speed by replacing the original spherical Bessel basis with a more efficient implementation. GemNet by Gasteiger and colleagues at NeurIPS 2021 incorporated dihedral angles, capturing four-body interactions.

EGNN by Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling at ICML 2021 introduced a simpler equivariant scheme that operates directly on coordinates without spherical harmonics, treating positions as features updated jointly with scalar features at each layer. NequIP by Simon Batzner and collaborators at Nature Communications 2022 used full SO(3) equivariant tensor products on top of e3nn and matched force-field accuracy with one or two orders of magnitude less training data. PaiNN by Schütt, Oliver Unke, and Michael Gastegger at ICML 2021 used a polarizable atom interaction scheme with vector-valued node features. MACE by Ilyes Batatia and collaborators at NeurIPS 2022 generalized NequIP using high-body-order equivariant features in a single layer.¹⁰ Allegro by Albert Musaelian and collaborators (Nature Communications 2023) introduced a strictly local equivariant model that scales to millions of atoms.

Since late 2023 these architectures have been scaled into universal machine-learned interatomic potentials (also called foundation models for atomistic simulation) that are trained once on broad chemistry and applied off the shelf. MACE-MP-0, released by the Csányi group in 2023, trains the MACE architecture on Materials Project relaxation trajectories spanning 89 elements and can run molecular dynamics on inorganic crystals, molten salts, and other systems without system-specific refitting; related efforts include CHGNet, M3GNet, and Meta's Open Materials 2024 (OMat24) potentials. Such models typically still need light fine-tuning to reach task-specific accuracy.¹¹

In November 2023, Google DeepMind released GNoME (Graph Networks for Materials Exploration), a GNN-based pipeline that predicted 2.2 million new crystal structures, of which about 380,000 (381,000 in the paper) were judged stable and lie on the updated convex hull of stability. The work, published in Nature, used a GNN ensemble for energy prediction combined with density functional theory verification, and the stable predictions were contributed to the Materials Project.¹² In January 2025, Microsoft Research published MatterGen in Nature, a diffusion model that generates crystal structures (elements, atomic positions, and lattice) conditioned on target properties such as magnetic density or mechanical strength, with code released under an open (MIT) license. The two systems are complementary: GNoME for high-throughput stability screening, MatterGen for property-conditioned generation of new candidates.¹³

In drug discovery the equivariant GNN community has also explored conformer prediction (GeoMol, GeoDiff), docking (DiffDock by Gabriele Corso and collaborators at ICLR 2023), and protein folding, where the Evoformer block at the heart of AlphaFold 2 uses triangle attention over a pair representation that is essentially a complete graph over residues. AlphaFold 3, announced in May 2024, replaces the structure module with a diffusion decoder and extends the same graph-style architecture to nucleic acids and small molecule ligands.

Knowledge graph embedding

A knowledge graph stores facts as triples of the form (head entity, relation, tail entity). Examples include Freebase, Wikidata, and large industrial knowledge graphs at Google and Amazon. The two main tasks are link prediction, also called knowledge graph completion, and entity classification. Knowledge graph embedding models map each entity and relation to a low-dimensional vector and score plausibility of a triple using a per-model formula.

Model	Year	Scoring	Notable property
TransE	2013	Negative L1 or L2 norm of h plus r minus t	Translation in embedding space, can not model symmetric relations
TransH	2014	Translation on relation hyperplane	Handles 1-to-N and N-to-1
DistMult	2015	Bilinear with diagonal relation matrix	Symmetric relations only
ComplEx	2016	Bilinear in complex space	Models antisymmetric relations
RotatE	2019	Rotation in complex plane	Captures symmetry, antisymmetry, and inversion
ConvE	2018	2D convolution over reshaped embedding	Strong with limited parameters
R-GCN	2017	GNN per-relation message passing	Combines structure and embeddings
CompGCN	2020	Joint entity and relation embedding in GNN	Generalizes earlier knowledge graph methods
Query2Box	2020	Box embedding for complex queries	Supports first-order logic queries

TransE, introduced by Antoine Bordes and colleagues at NeurIPS 2013, treats each relation as a translation vector and scores a triple by the distance between the translated head and the tail. DistMult by Bishan Yang and collaborators at ICLR 2015 replaced translation with a diagonal bilinear form. ComplEx by Théo Trouillon and collaborators at ICML 2016 lifted DistMult to complex numbers so antisymmetric relations (parent of, supervises) could be represented. RotatE by Zhiqing Sun and collaborators at ICLR 2019 modeled each relation as a rotation in complex space, which jointly captures symmetry, antisymmetry, inversion, and composition. ConvE applied a small 2D CNN to reshaped entity and relation embeddings and remains a popular baseline due to its parameter efficiency.

R-GCN combines a knowledge graph embedding objective with GNN message passing, treating each relation type as a separate channel. CompGCN by Shikhar Vashishth and collaborators at ICLR 2020 generalized this further by jointly updating entity and relation embeddings through a single GNN. Query embedding methods extend the framework to multi-hop queries: Query2Box by Hongyu Ren and collaborators at ICLR 2020 embeds each query as a box in vector space, with logical operators like intersection and union implemented as box intersection. QueryR2N and other follow-ups expand the operator set to include negation.

Applications

Graph machine learning is in production at several large companies. The applications listed below have public technical write-ups or peer-reviewed papers.

Drug discovery and structural biology

The most visible application is protein structure prediction. AlphaFold 2 by DeepMind, published in Nature in July 2021, predicts the 3D structure of a protein from its amino acid sequence by treating the residues and their pairwise relations as a graph and using a custom attention mechanism called the Evoformer over both a multiple sequence alignment and a pair representation. AlphaFold 2 reached experimental accuracy on the CASP14 benchmark and triggered the release of the AlphaFold Protein Structure Database, which now contains predictions for more than 200 million proteins. RoseTTAFold by David Baker's lab at the University of Washington achieved similar results through a three-track architecture. AlphaFold 3, published in May 2024 in a Nature paper, extends the model to predict structures of complexes including DNA, RNA, ligands, and post-translational modifications, using a diffusion module conditioned on a graph encoder.¹⁴ Demis Hassabis and John Jumper were awarded a share of the 2024 Nobel Prize in Chemistry for the AlphaFold work, and in November 2024 DeepMind released the AlphaFold 3 inference code under a non-commercial license (CC-BY-NC-SA 4.0), with model weights available to academic researchers on request.¹⁵ Independent open reproductions followed quickly, including MIT's Boltz-1 and Boltz-2, Chai-1, Protenix, and HelixFold3, several of which report accuracy comparable to AlphaFold 3 on public benchmarks.¹⁶

In small molecule drug discovery, GNNs power molecular property prediction, molecular generation, and docking. Chemprop, a directed MPNN developed at the Massachusetts Institute of Technology, was used by Jonathan Stokes, Kevin Yang, Kyle Swanson, Wengong Jin, and colleagues in the laboratory of James Collins to discover the antibiotic halicin in 2020; the model, trained on chemical libraries to predict antibacterial activity, flagged a molecule structurally distant from known antibiotics that proved active against a broad range of pathogens.¹⁷ DiffDock, presented at ICLR 2023, scores ligand poses by a diffusion model over translations, rotations, and torsions, with an equivariant GNN as the score network. GNoME for materials and MatterGen for generative materials design are described above.

Recommendation

PinSAGE by Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William Hamilton, and Jure Leskovec at KDD 2018 was the first GNN deployed at web scale. PinSAGE ran on a bipartite graph of 3 billion pins and 18 billion edges at Pinterest, generating embeddings for related pin recommendations through random walk sampling and importance pooling. The system replaced an existing collaborative filtering pipeline and was reported to improve recommendation quality by double-digit percentages.

UberEats described in 2019 a graph-based system for dish and restaurant recommendation. LinkedIn published several papers on heterogeneous GNNs for member-job matching. Alibaba's M2GRL uses multi-view GNNs over the Taobao product graph.

Traffic and routing

Google Maps switched its estimated time of arrival model to a GNN in 2020. Austin Derrow-Pinion, Jennifer She, David Wong, Oliver Lange, and collaborators described the system in a paper at CIKM 2021 and in Nature Machine Intelligence. The model treats road segments as edges and intersections as nodes, with per-segment travel time predicted by a GNN that aggregates spatial context. Google reported up to 50 percent reductions in ETA accuracy errors in several cities. Uber and DiDi have published similar systems.

Fraud and risk

Visa described in 2022 a transactional graph GNN that flags fraud rings by jointly considering accounts, devices, merchants, and IP addresses. PayPal, Stripe, Ant Group, and Tencent have published GNN-based anti-money-laundering pipelines. The advantage is the ability to detect coordinated patterns invisible from any single transaction.

Other domains

Fake news detection uses heterogeneous GNNs over user-article-comment graphs (Monti et al. 2019).
Social network analysis uses GNNs for community detection and influence prediction (DeepGL, Rossi et al.).
Computational physics uses GNNs for particle simulation (Sanchez-Gonzalez et al. 2020) and weather forecasting: GraphCast by DeepMind (Science, December 2023) produces deterministic ten-day forecasts on a multi-mesh icosahedral graph,¹⁸ and its diffusion-based successor GenCast (Nature, 2024) generates 15-day probabilistic ensembles that outperformed the ECMWF ENS ensemble on 97.4 percent of the 1,320 targets evaluated.¹⁹ Climate emulation uses related models such as NeuralGCM (2024).
Power grid forecasting uses GNNs for short-term load and stability prediction (Donon et al. 2020).
Combinatorial optimization is attacked through GNN-based heuristics for routing, scheduling, and chip placement; the reinforcement-learning floorplanning method described by Mirhoseini and colleagues in Nature in 2021, which uses a GNN policy to place macros on chip floor plans, was named AlphaChip in a September 2024 Nature addendum and has been used across several generations of Google tensor processing units and by external chipmakers.²⁰

Libraries

Three major libraries dominate the field. All three implement the MPNN abstraction, ship hundreds of layers and datasets, and integrate with PyTorch, JAX, or TensorFlow.

Library	Backend	First release	Maintainer	Notable feature
PyTorch Geometric (PyG)	PyTorch	2019	Matthias Fey, Jan Eric Lenssen (TU Dortmund, Kumo AI)	Largest model zoo, used in most academic papers
Deep Graph Library (DGL)	PyTorch, MXNet, TensorFlow	2018	Amazon AWS AI, NYU Shanghai, NYU	Distributed multi-machine training
jraph	JAX	2020	Google DeepMind	Functional, fast on TPU
Spektral	Keras, TensorFlow	2019	Daniele Grattarola	Easy Keras-style API
Stellar Graph	TensorFlow	2018	CSIRO Data61	End-to-end pipelines
TensorFlow GNN	TensorFlow	2021	Google	Production deployment in Google Cloud
TorchDrug	PyTorch	2021	MILA	Drug discovery focus

PyTorch Geometric (PyG), introduced by Matthias Fey and Jan Eric Lenssen in 2019 at the ICLR Representation Learning on Graphs workshop, is the de facto standard for academic graph learning. PyG implements over 100 layers, ships standard datasets (OGB, TU, PPI, Reddit), and supports heterogeneous graphs, temporal graphs, and explainability tools. The library is built on a sparse tensor backend and integrates with the rest of the PyTorch ecosystem.

Deep Graph Library (DGL), released by AWS AI Lab and NYU in 2018, offers a similar feature set with stronger support for distributed training on multi-machine clusters. DGL exposes a relational message passing API convenient for heterogeneous graphs and supports several backends. The DGL team also maintains DGL-KE for large-scale knowledge graph embedding.

jraph by DeepMind, released in 2020, is a minimal functional library for graph nets in JAX. It is used by the AlphaFold team and other DeepMind researchers. Spektral by Daniele Grattarola is a Keras-based library aimed at fast prototyping. Stellar Graph by CSIRO Data61 offers end-to-end pipelines. TensorFlow GNN (TF-GNN), released by Google in 2021, exposes a heterogeneous graph schema and integrates with TensorFlow Extended.

Benchmarks

Graph learning benchmarks have evolved rapidly because early datasets (Cora, CiteSeer, PubMed) saturated quickly and were criticized for being too small and too easy. Several modern benchmark suites address these issues.

Benchmark	Released	Scope	Notable property
TUDatasets	2014 to 2020	120+ molecule, social, and biological graphs	Standard for graph classification
Cora, CiteSeer, PubMed	Pre-2010	Citation networks	Classic transductive node classification
MoleculeNet	2018	Quantum, physiology, biophysics tasks	Pioneering chemistry benchmark
QM9	2014	134k small molecules, 12 quantum properties	Workhorse for equivariant networks
OGB	2020	Node, link, and graph tasks of various scales	Standardized splits, leaderboards
OGB-LSC	2021	KDD Cup 2021 large-scale tasks	PCQM4M, MAG240M, WikiKG90M
LRGB	2022	Long Range Graph Benchmark	Tests global mixing capability
GNN Benchmark suite	2020	Six tasks for fair comparison	Curated by Dwivedi and Bresson
Open Catalyst	2020	Catalyst materials simulation	130 million DFT calculations
MalNet	2021	1.2M function call graphs	Malware classification

Open Graph Benchmark (OGB), introduced by Weihua Hu, Matthias Fey, Marinka Zitnik, and Jure Leskovec at NeurIPS 2020, defined standardized splits and metrics across graph sizes from small molecules to 100-million-edge citation graphs. OGB-LSC, released in 2021, scaled up to MAG240M (240 million nodes), WikiKG90M (90 million entities), and PCQM4M (4 million molecules). The OGB Large Scale Challenge at KDD Cup 2021 was won by Microsoft's Graphormer team on PCQM4M.

MoleculeNet by Zhenqin Wu, Bharath Ramsundar, and Vijay Pande in 2018 packaged 17 chemistry datasets covering quantum mechanics, physical chemistry, biophysics, and physiology, with scaffold splits that approximate generalization to new chemical scaffolds. LRGB (Long Range Graph Benchmark) by Dwivedi and colleagues in 2022 selected five datasets where information must propagate across many hops. LRGB is widely used to evaluate graph transformer architectures.

Limitations

Despite the rapid progress, several fundamental issues constrain GNN performance and have driven much of the research agenda in the last five years.

Over-smoothing. Qimai Li, Zhichao Han, and Xiao-Ming Wu observed in 2018 that stacking many GCN layers causes node representations to converge to indistinguishable vectors, since repeated averaging acts like a low-pass filter on the Laplacian. The practical result is that most GCNs use only two or three layers, which limits the receptive field. Mitigations include residual connections (DeepGCN by Guohao Li and colleagues in 2019), PairNorm normalization, and the use of attention or gating to control diffusion.

Over-squashing. Uri Alon and Eran Yahav showed at ICLR 2021 that the exponential growth of the receptive field across layers combined with a fixed-size node representation forces the network to compress information from many distant nodes into a single vector, losing information about long-range dependencies. They demonstrated that GAT-like architectures suffer especially badly on synthetic tasks that need to combine information across many hops. Cristian Bodnar and colleagues at NeurIPS 2022 connected over-squashing to negative curvature in the underlying graph and proposed structural rewiring to alleviate it.

Expressivity bounds. The Xu et al. analysis at ICLR 2019 proved that any standard message passing GNN is at most as powerful as the 1-Weisfeiler-Lehman test, meaning there exist non-isomorphic graphs that no GCN, GraphSAGE, or GIN can distinguish. Several stronger frameworks have been proposed, including k-GNN (which simulates k-WL at cost exponential in k), Provably Powerful Graph Networks by Maron and colleagues, and identity-aware GNNs (ID-GNN) by You and colleagues at AAAI 2021. Subgraph-based approaches such as ESAN by Bevilacqua and collaborators at ICLR 2022 also push past the 1-WL barrier.

Scalability. Even with sampling, graphs over 100 million nodes remain difficult to train on. Industrial systems at Amazon and Alibaba rely on heavy engineering: custom sampling kernels, multi-GPU partitioning, and offline neighbor precomputation. Distributed training is harder than for images or text because of irregular memory access and neighborhood dependency structure.

Heterophily. Most early benchmarks were homophilous: connected nodes tend to share the same label. On heterophilous graphs (where neighbors usually disagree), basic GCN underperforms a simple MLP that ignores the graph entirely. Several architectures, including H2GCN by Zhu and colleagues at NeurIPS 2020 and GPR-GNN by Chien and colleagues, address heterophily by learning signed or per-hop coefficients.

Robustness. GNNs are sensitive to graph structure perturbations. A handful of strategic edge additions can change a node's predicted class. Daniel Zügner and Stephan Günnemann at KDD 2018 introduced Nettack, the first targeted adversarial attack on GNNs.

References

Key papers

Kipf, T. N., and Welling, M. "Semi-Supervised Classification with Graph Convolutional Networks." ICLR 2017. arXiv:1609.02907.
Hamilton, W., Ying, R., and Leskovec, J. "Inductive Representation Learning on Large Graphs" (GraphSAGE). NeurIPS 2017. arXiv:1706.02216.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. "Neural Message Passing for Quantum Chemistry" (MPNN). ICML 2017. arXiv:1704.01212.
Schlichtkrull, M., Kipf, T. N., Bloem, P., et al. "Modeling Relational Data with Graph Convolutional Networks" (R-GCN). ESWC 2018. arXiv:1703.06103.
Defferrard, M., Bresson, X., and Vandergheynst, P. "Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering" (ChebNet). NeurIPS 2016. arXiv:1606.09375.
Schütt, K. T., Kindermans, P. J., Sauceda, H. E., et al. "SchNet: A Continuous-Filter Convolutional Neural Network for Modeling Quantum Interactions." NeurIPS 2017. arXiv:1706.08566.
Satorras, V. G., Hoogeboom, E., and Welling, M. "E(n) Equivariant Graph Neural Networks" (EGNN). ICML 2021. arXiv:2102.09844.
Batzner, S., Musaelian, A., Sun, L., et al. "E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials" (NequIP). Nature Communications, 2022.
Sun, Z., Deng, Z. H., Nie, J. Y., and Tang, J. "RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space." ICLR 2019. arXiv:1902.10197.
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and Yakhnenko, O. "Translating Embeddings for Modeling Multi-Relational Data" (TransE). NeurIPS 2013.
Jumper, J., Evans, R., Pritzel, A., et al. "Highly Accurate Protein Structure Prediction with AlphaFold" (AlphaFold 2). Nature, July 2021.
Ying, R., He, R., Chen, K., et al. "Graph Convolutional Neural Networks for Web-Scale Recommender Systems" (PinSAGE). KDD 2018.
Derrow-Pinion, A., She, J., Wong, D., et al. "ETA Prediction with Graph Neural Networks in Google Maps." CIKM 2021. arXiv:2108.11482.
Hu, W., Fey, M., Zitnik, M., et al. "Open Graph Benchmark: Datasets for Machine Learning on Graphs." NeurIPS 2020. arXiv:2005.00687.
Dwivedi, V. P., Rampášek, L., Galkin, M., et al. "Long Range Graph Benchmark." NeurIPS 2022 Datasets and Benchmarks. arXiv:2206.08164.
Fey, M., and Lenssen, J. E. "Fast Graph Representation Learning with PyTorch Geometric." ICLR 2019 RLGM workshop. arXiv:1903.02428.
Wang, M., Zheng, D., Ye, Z., et al. "Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks." arXiv:1909.01315.
Alon, U., and Yahav, E. "On the Bottleneck of Graph Neural Networks and its Practical Implications" (over-squashing). ICLR 2021. arXiv:2006.05205.
Li, Q., Han, Z., and Wu, X. M. "Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning" (over-smoothing). AAAI 2018. arXiv:1801.07606.
Zügner, D., Akbarnejad, A., and Günnemann, S. "Adversarial Attacks on Neural Networks for Graph Data" (Nettack). KDD 2018. arXiv:1805.07984.

Sanchez-Lengeling, B., Reif, E., Pearce, A., and Wiltschko, A. B. "A Gentle Introduction to Graph Neural Networks." Distill, 2021. https://distill.pub/2021/gnn-intro/ Accessed 2026-05-31. ↩
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. "Graph Attention Networks." ICLR 2018. arXiv:1710.10903. https://arxiv.org/abs/1710.10903 Accessed 2026-05-31. ↩
Brody, S., Alon, U., and Yahav, E. "How Attentive are Graph Attention Networks?" ICLR 2022. arXiv:2105.14491. https://arxiv.org/abs/2105.14491 Accessed 2026-05-31. ↩
Xu, K., Hu, W., Leskovec, J., and Jegelka, S. "How Powerful are Graph Neural Networks?" ICLR 2019. arXiv:1810.00826. https://arxiv.org/abs/1810.00826 Accessed 2026-05-31. ↩
Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., Shen, Y., and Liu, T. Y. "Do Transformers Really Perform Badly for Graph Representation?" NeurIPS 2021. arXiv:2106.05234. https://arxiv.org/abs/2106.05234 Accessed 2026-05-31. ↩
Open Graph Benchmark. "OGB-LSC @ KDD Cup 2021." Stanford, 2021. https://ogb.stanford.edu/kddcup2021/ Accessed 2026-05-31. ↩
Kim, J., Nguyen, T. D., Min, S., Cho, S., Lee, M., Lee, H., and Hong, S. "Pure Transformers are Powerful Graph Learners." NeurIPS 2022. arXiv:2207.02505. https://arxiv.org/abs/2207.02505 Accessed 2026-05-31. ↩
Rampášek, L., Galkin, M., Dwivedi, V. P., Luu, A. T., Wolf, G., and Beaini, D. "Recipe for a General, Powerful, Scalable Graph Transformer." NeurIPS 2022. arXiv:2205.12454. https://arxiv.org/abs/2205.12454 Accessed 2026-05-31. ↩
Mao, H., Chen, Z., Tang, W., et al. "Graph Foundation Models: A Comprehensive Survey." arXiv:2505.15116, 2025. https://arxiv.org/abs/2505.15116 Accessed 2026-05-31. ↩
Batatia, I., Kovács, D. P., Simm, G. N. C., Ortner, C., and Csányi, G. "MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields." NeurIPS 2022. arXiv:2206.07697. https://arxiv.org/abs/2206.07697 Accessed 2026-05-31. ↩
Batatia, I., et al. "A Foundation Model for Atomistic Materials Chemistry" (MACE-MP-0). arXiv:2401.00096, 2023. https://arxiv.org/abs/2401.00096 Accessed 2026-05-31. ↩
Merchant, A., Batzner, S., Schoenholz, S. S., Aykol, M., Cheon, G., and Cubuk, E. D. "Scaling Deep Learning for Materials Discovery" (GNoME). Nature, 2023. https://www.nature.com/articles/s41586-023-06735-9 Accessed 2026-05-31. ↩
Zeni, C., Pinsler, R., Zügner, D., et al. "A Generative Model for Inorganic Materials Design" (MatterGen). Nature, January 2025. https://www.nature.com/articles/s41586-025-08628-5 Accessed 2026-05-31. ↩
Abramson, J., Adler, J., Dunger, J., et al. "Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3." Nature, May 2024. https://www.nature.com/articles/s41586-024-07487-w Accessed 2026-05-31. ↩
Callaway, E. "Major AlphaFold Upgrade Offers Boost for Drug Discovery" / "AI Protein-Prediction Tool AlphaFold3 is Now More Open." Nature news, November 2024. https://www.nature.com/articles/d41586-024-03708-4 Accessed 2026-05-31. ↩
Wohlwend, J., et al. "Boltz-1: Democratizing Biomolecular Interaction Modeling." bioRxiv, 2024 (and Boltz-2, Chai-1, Protenix reproductions). https://jclinic.mit.edu/boltz-1/ Accessed 2026-05-31. ↩
Stokes, J. M., Yang, K., Swanson, K., Jin, W., et al. "A Deep Learning Approach to Antibiotic Discovery." Cell, February 2020. https://www.cell.com/cell/fulltext/S0092-8674(20)30102-1 Accessed 2026-05-31. ↩
Lam, R., Sanchez-Gonzalez, A., Willson, M., et al. "Learning Skillful Medium-Range Global Weather Forecasting" (GraphCast). Science, December 2023. https://www.science.org/doi/10.1126/science.adi2336 Accessed 2026-05-31. ↩
Price, I., Sanchez-Gonzalez, A., Alet, F., et al. "Probabilistic Weather Forecasting with Machine Learning" (GenCast). Nature, 2024. arXiv:2312.15796. https://arxiv.org/abs/2312.15796 Accessed 2026-05-31. ↩
Mirhoseini, A., Goldie, A., Yazgan, M., et al. "A Graph Placement Methodology for Fast Chip Design" (named AlphaChip; addendum September 2024). Nature, 2021. https://www.nature.com/articles/s41586-021-03544-w Accessed 2026-05-31. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Recurrent Neural Network Tabular Classification Models

Overview

History

Core GNN architectures

Transformer based and hybrid graph models

Equivariant networks for chemistry and materials

Knowledge graph embedding

Applications

Drug discovery and structural biology

Recommendation

Traffic and routing

Fraud and risk

Other domains

Libraries

Benchmarks

Limitations

See also

References

Key papers

Footnotes

Improve this article

Related Articles

Liquid AI

Mamba 2

Jamba2

Long Short-Term Memory (LSTM)

Machine learning terms/Sequence Models

Multi-head Latent Attention

What links here

Related Articles

Liquid AI

Mamba 2

Jamba2

Long Short-Term Memory (LSTM)

Machine learning terms/Sequence Models

Multi-head Latent Attention

What links here