Graph Machine Learning Models
Last reviewed
May 31, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v3 · 6,253 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v3 · 6,253 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Graph Neural Network, Deep Learning, and Multimodal Models.
Graph machine learning models are neural networks designed to operate on data structured as graphs, where the input is a set of nodes connected by edges rather than a grid like an image or a sequence like text. The defining property of these models is permutation equivariance: the output for a node should not change if the graph is relabeled, which rules out treating a graph as a flat vector and motivates the message passing computation used by most modern graph neural networks (GNNs).1 As a branch of machine learning and deep learning specialized to relational data, the field has, since the introduction of the Graph Convolutional Network (GCN) by Thomas Kipf and Max Welling in 2017, expanded into hundreds of architectures, supports several large open-source libraries, and powers production systems at Google, Pinterest, Amazon, Google DeepMind, and Microsoft.
Unlike convolutional or recurrent networks, GNNs do not assume a fixed neighborhood or a fixed input length. Each node updates its representation by aggregating signals from its neighbors and then applying a learnable transformation. The same parameters are shared across every node and every edge, which makes the model size independent of graph size and lets a model trained on small graphs generalize to larger ones. The graphs themselves can be homogeneous (one node type, one edge type), heterogeneous (multiple types, as in knowledge graphs), directed or undirected, weighted or unweighted, and static or evolving in time.
Graph machine learning addresses five canonical tasks. The choice of task determines the loss function, the readout, and the evaluation metric.
| Task | Goal | Example | Typical readout |
|---|---|---|---|
| Node classification | Predict a label for each node | Citation network paper category | Per-node softmax |
| Link prediction | Predict whether two nodes are connected | Recommending a friend on a social network | Dot product or bilinear score |
| Graph classification | Predict a label for the whole graph | Molecule toxicity | Sum or mean pooling of node embeddings |
| Graph regression | Predict a continuous value for a graph | Molecular property like HOMO LUMO gap | Pooled embedding plus MLP |
| Graph generation | Sample new graphs from a distribution | De novo drug design | Autoregressive or diffusion decoder |
Several other settings build on these primitives. Community detection partitions nodes into clusters using embeddings from an unsupervised GNN. Subgraph matching identifies whether a small motif occurs inside a larger graph. Combinatorial optimization (traveling salesman, maximum independent set) has been attacked with GNN policies trained via reinforcement learning.
Inputs can carry features on nodes, edges, or both. A molecule has atom features (element, charge) on nodes and bond features (single, double, aromatic) on edges. A road network has road type and length on edges and intersection coordinates on nodes. Modern GNN libraries treat these features uniformly through a small set of message and update functions.
The idea of computing on graphs with neural networks predates the deep learning era. Marco Gori and Franco Scarselli proposed the original "graph neural network" in 2005 and 2009 as a recurrent fixed point system. It was hard to train and never became widely used. The current wave of GNNs grew from two lines of work that converged around 2016 and 2017.
The first line was shallow node embedding, which treats each node as a token and learns a low-dimensional vector for it without parametric message passing. DeepWalk by Bryan Perozzi, Rami Al-Rfou, and Steven Skiena, presented at KDD 2014, generated random walks on a graph and ran the word2vec skip-gram model on the resulting sequences. LINE by Jian Tang and collaborators at WWW 2015 preserved first-order and second-order proximity through an explicit objective. node2vec, by Aditya Grover and Jure Leskovec at KDD 2016, extended DeepWalk with biased random walks controlled by two parameters that interpolate between breadth-first and depth-first exploration. These models scaled to millions of nodes but produced a fixed embedding table that could not handle new nodes or edge features.
The second line was spectral graph convolution. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun proposed in 2014 to define convolution on a graph through the eigendecomposition of the graph Laplacian. The cost was cubic in the number of nodes, and filters were not localized. Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst introduced ChebNet at NeurIPS 2016, which approximated spectral filters with Chebyshev polynomials of the Laplacian and reduced cost to linear in the number of edges. ChebNet was the immediate precursor to GCN.
The Graph Convolutional Network (GCN), introduced by Kipf and Welling at ICLR 2017, simplified ChebNet to a single-hop filter with a renormalization trick and showed strong results on the Cora, CiteSeer, and PubMed citation benchmarks. Within months the field exploded. GraphSAGE by Will Hamilton, Rex Ying, and Jure Leskovec at NeurIPS 2017 introduced inductive learning through neighbor sampling. Graph Attention Network (GAT) by Petar Veličković and collaborators at ICLR 2018 added learnable attention weights. Justin Gilmer and colleagues at ICML 2017 unified existing models under the Message Passing Neural Network (MPNN) framework.
Most convolutional GNNs follow the same two step recipe. For each node, aggregate a function of the neighbor features and the connecting edge features, then update the node feature using the aggregated message and the previous state. Models differ in the aggregator (sum, mean, max, attention) and the update (linear, MLP, gated recurrent unit). The table below lists the most cited architectures.
| Architecture | Year | Authors | Aggregator | Notable property |
|---|---|---|---|---|
| GCN | 2017 | Kipf, Welling | Normalized sum with degree | Simplest spectral approximation, transductive |
| GraphSAGE | 2017 | Hamilton, Ying, Leskovec | Mean, max, or LSTM over sampled neighbors | First inductive GNN at scale |
| GAT | 2018 | Veličković et al. | Multi-head attention | Learnable neighbor weighting |
| MPNN | 2017 | Gilmer et al. | General message function plus update | Unifying framework for chemistry GNNs |
| GIN | 2019 | Xu, Hu, Leskovec, Jegelka | Sum after MLP | Provably as expressive as the 1-Weisfeiler-Lehman test |
| R-GCN | 2017 | Schlichtkrull et al. | Per-relation linear sum | First strong GNN for knowledge graphs |
| HAN | 2019 | Wang et al. | Meta-path attention | Heterogeneous node and edge types |
| Cluster-GCN | 2019 | Chiang et al. | Mini-batch within graph partitions | Scaling GCN to 100M edges |
| GraphSAINT | 2020 | Zeng et al. | Subgraph sampling with normalization | Unbiased mini-batches for large graphs |
| SIGN | 2020 | Frasca et al. | Precomputed multi-hop diffusion | Single SGD step over a feature MLP, very fast |
| PNA | 2020 | Corso et al. | Multiple aggregators with degree scalers | Strong on regular graphs |
| DiffPool | 2018 | Ying et al. | Learnable hierarchical pooling | First differentiable graph pooling |
GCN. The forward pass for a single layer computes a normalized adjacency multiplication: each node averages the features of its neighbors and itself, weighted by the inverse square root of the product of their degrees. A learnable linear map and a nonlinearity follow. Two layers of GCN cover the two-hop neighborhood of each node and reach state of the art on small citation graphs. GCN is transductive in its original formulation: the renormalized adjacency matrix is computed on the whole graph at training time, so adding new nodes later requires recomputing the matrix.
GraphSAGE. GraphSAGE solves the inductive problem by sampling a fixed number of neighbors for each node and applying a permutation invariant aggregator (mean, pool, or LSTM) over the sample. Because no full adjacency multiplication is required, GraphSAGE can produce embeddings for nodes that did not exist at training time, which is what made it production-ready for recommender systems.
GAT. GAT replaces the fixed adjacency weights with learned attention coefficients.2 For each edge, a shared linear map produces scores that are normalized over the neighborhood by a softmax. The model uses multiple attention heads in parallel, similar to the Transformer block, and concatenates or averages their outputs. The attention weights are interpretable in some cases and make the model robust to noisy edges. Shaked Brody, Uri Alon, and Eran Yahav showed at ICLR 2022 that the original GAT computes only static attention, in which the ranking of neighbors is fixed regardless of the query node; their GATv2 moves the nonlinearity after the concatenation to recover dynamic attention as a drop-in replacement with the same parameter count, and it is now the default GAT variant in the major libraries.3
MPNN. Gilmer and colleagues proposed MPNN as a unifying notation: each layer computes a message for each edge using the source feature, target feature, and edge feature, sums the messages into each target node, and then updates the target node with a recurrent or feed-forward function. Almost every later architecture can be written in MPNN form, and most graph libraries expose an MPNN class as the base abstraction.
GIN. Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka analyzed GNN expressivity at ICLR 2019. They proved that any sum-based aggregator followed by an MLP is at most as powerful as the 1-Weisfeiler-Lehman (1-WL) graph isomorphism test, that mean and max aggregators are strictly weaker, and that an injective sum aggregator (Graph Isomorphism Network or GIN) achieves the 1-WL bound. GIN became the standard benchmark architecture for graph classification because its theoretical properties match its empirical performance on TU datasets.4
R-GCN and HAN. Real graphs often have typed nodes and edges. R-GCN, by Michael Schlichtkrull and collaborators in 2017, assigns a separate learnable weight matrix to each relation type and sums the per-relation messages. Basis decomposition or block diagonal decomposition controls the parameter count when the number of relations is large. HAN (Heterogeneous Attention Network) by Xiao Wang and colleagues at WWW 2019 generalizes attention along meta-paths, fixed sequences of relations that capture semantic patterns in a heterogeneous graph.
Scalability variants. Standard message passing requires the full neighbor set of each node at every layer, which blows up memory when the graph has hundreds of millions of edges. GraphSAINT samples a connected subgraph at each iteration and applies normalization to correct sampling bias. Cluster-GCN partitions the graph with a graph clustering algorithm (METIS) and performs mini-batch training within each partition. SIGN precomputes multi-hop diffused features once and trains a feed-forward MLP on the result, trading expressivity for raw throughput.
Graph transformers apply the self-attention mechanism to all node pairs, not only to neighbors. They lose the inductive bias of locality but gain global receptive field, which helps on tasks where long-range information matters (predicting properties of polymers, reading long chains in a parse tree). The challenge is how to inject the graph structure since vanilla self-attention is permutation invariant and would treat the graph as a bag of nodes.
Vijay Prakash Dwivedi and Xavier Bresson proposed the Graph Transformer in 2020, which adds Laplacian eigenvectors as positional encodings so the attention layer can distinguish nodes by their structural role. The Spectral Attention Network (SAN) by Devin Kreuzer and colleagues at NeurIPS 2021 extends this with learned positional encodings derived from the full eigendecomposition.
Graphormer, introduced by Chengxuan Ying, Tianle Cai, and collaborators at NeurIPS 2021, encodes graph structure through three biases added to the attention logits: a centrality encoding for each node based on its degree, a spatial encoding based on the shortest path distance between nodes, and an edge encoding aggregated along the shortest path.5 A Microsoft Research Asia team built on Graphormer to win the graph-prediction track of the OGB Large Scale Challenge (OGB-LSC) on the PCQM4M quantum chemistry dataset at KDD Cup 2021, beating every message passing baseline by a clear margin.6
TokenGT (Tokenized Graph Transformer), proposed by Jinwoo Kim, Tien Dat Nguyen, and collaborators at NeurIPS 2022, took the opposite design stance: it feeds every node and every edge into a standard, unmodified Transformer as independent tokens, augmented only with orthonormal node identifiers and type embeddings. The authors proved that with these token embeddings a plain Transformer is at least as expressive as a second-order invariant graph network (2-IGN), and therefore strictly more expressive than any message passing GNN, while reaching competitive accuracy on PCQM4Mv2. TokenGT is often cited as evidence that graph-specific architecture is not strictly necessary if structure is encoded in the input tokens.7
GraphGPS by Ladislav Rampášek, Mikhail Galkin, and Dominique Beaini at NeurIPS 2022 proposed a recipe for hybrid models. Each block contains a local message passing layer in parallel with a global attention layer, with both fed by learned positional and structural encodings.8 GraphGPS gave consistent gains across the Long Range Graph Benchmark and inspired a wave of hybrid architectures.
Exphormer by Hamed Shirzad and collaborators at ICML 2023 reduced the quadratic attention cost using expander graphs as a sparse global connectivity pattern, making global attention tractable on graphs with tens of thousands of nodes. NAGphormer by Jinwoo Kim and colleagues reformulated graph attention as a sequence problem over hop counts, allowing the use of standard transformer libraries.
The most recent line of work applies state space models to graphs. GraphMamba by Chloe Wang and colleagues in 2024 adapts the Mamba selective state space layer to graph data by selecting node sequences with structural relevance to the target node. Initial results suggest sub-quadratic global mixing with performance competitive with GraphGPS on long range tasks. Several follow-ups including Graph-Mamba and GMN have explored alternative scan orders such as breadth-first traversal and random walks.
Inspired by large language models, a parallel effort aims at graph foundation models: a single model pretrained on many graphs and transferred to new datasets and tasks by fine-tuning, in-context learning, or zero-shot inference. Surveys distinguish universal, domain-specific, and task-specific foundation models, and approaches such as GIT (Graph Generality Identifier on Task-Trees, 2024) report transfer across dozens of graphs in several domains. Whether a single architecture can generalize across graphs whose nodes and edges represent very different objects remains an open research question as of 2026.9
Molecules and crystals carry geometry: each atom has a 3D position, and the physical properties of the system are invariant under translation, rotation, and reflection of the whole structure. Plain GNNs that use only the graph topology lose this information. Equivariant graph neural networks preserve it by ensuring that if the input coordinates rotate, the output rotates in the same way.
The pioneer was SchNet by Kristof Schütt, Pieter-Jan Kindermans, and collaborators at NeurIPS 2017. SchNet uses continuous-filter convolutions parameterized by the interatomic distance, which makes the energy prediction translation, rotation, and permutation invariant. SchNet trained on the QM9 dataset reached chemical accuracy on several molecular properties for the first time with a neural network.
DimeNet by Johannes Gasteiger, Janek Groß, and Stephan Günnemann at ICLR 2020 added directional information through messages that depend on bond angles, not only on distances. DimeNet++ improved the speed by replacing the original spherical Bessel basis with a more efficient implementation. GemNet by Gasteiger and colleagues at NeurIPS 2021 incorporated dihedral angles, capturing four-body interactions.
EGNN by Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling at ICML 2021 introduced a simpler equivariant scheme that operates directly on coordinates without spherical harmonics, treating positions as features updated jointly with scalar features at each layer. NequIP by Simon Batzner and collaborators at Nature Communications 2022 used full SO(3) equivariant tensor products on top of e3nn and matched force-field accuracy with one or two orders of magnitude less training data. PaiNN by Schütt, Oliver Unke, and Michael Gastegger at ICML 2021 used a polarizable atom interaction scheme with vector-valued node features. MACE by Ilyes Batatia and collaborators at NeurIPS 2022 generalized NequIP using high-body-order equivariant features in a single layer.10 Allegro by Albert Musaelian and collaborators (Nature Communications 2023) introduced a strictly local equivariant model that scales to millions of atoms.
Since late 2023 these architectures have been scaled into universal machine-learned interatomic potentials (also called foundation models for atomistic simulation) that are trained once on broad chemistry and applied off the shelf. MACE-MP-0, released by the Csányi group in 2023, trains the MACE architecture on Materials Project relaxation trajectories spanning 89 elements and can run molecular dynamics on inorganic crystals, molten salts, and other systems without system-specific refitting; related efforts include CHGNet, M3GNet, and Meta's Open Materials 2024 (OMat24) potentials. Such models typically still need light fine-tuning to reach task-specific accuracy.11
In November 2023, Google DeepMind released GNoME (Graph Networks for Materials Exploration), a GNN-based pipeline that predicted 2.2 million new crystal structures, of which about 380,000 (381,000 in the paper) were judged stable and lie on the updated convex hull of stability. The work, published in Nature, used a GNN ensemble for energy prediction combined with density functional theory verification, and the stable predictions were contributed to the Materials Project.12 In January 2025, Microsoft Research published MatterGen in Nature, a diffusion model that generates crystal structures (elements, atomic positions, and lattice) conditioned on target properties such as magnetic density or mechanical strength, with code released under an open (MIT) license. The two systems are complementary: GNoME for high-throughput stability screening, MatterGen for property-conditioned generation of new candidates.13
In drug discovery the equivariant GNN community has also explored conformer prediction (GeoMol, GeoDiff), docking (DiffDock by Gabriele Corso and collaborators at ICLR 2023), and protein folding, where the Evoformer block at the heart of AlphaFold 2 uses triangle attention over a pair representation that is essentially a complete graph over residues. AlphaFold 3, announced in May 2024, replaces the structure module with a diffusion decoder and extends the same graph-style architecture to nucleic acids and small molecule ligands.
A knowledge graph stores facts as triples of the form (head entity, relation, tail entity). Examples include Freebase, Wikidata, and large industrial knowledge graphs at Google and Amazon. The two main tasks are link prediction, also called knowledge graph completion, and entity classification. Knowledge graph embedding models map each entity and relation to a low-dimensional vector and score plausibility of a triple using a per-model formula.
| Model | Year | Scoring | Notable property |
|---|---|---|---|
| TransE | 2013 | Negative L1 or L2 norm of h plus r minus t | Translation in embedding space, can not model symmetric relations |
| TransH | 2014 | Translation on relation hyperplane | Handles 1-to-N and N-to-1 |
| DistMult | 2015 | Bilinear with diagonal relation matrix | Symmetric relations only |
| ComplEx | 2016 | Bilinear in complex space | Models antisymmetric relations |
| RotatE | 2019 | Rotation in complex plane | Captures symmetry, antisymmetry, and inversion |
| ConvE | 2018 | 2D convolution over reshaped embedding | Strong with limited parameters |
| R-GCN | 2017 | GNN per-relation message passing | Combines structure and embeddings |
| CompGCN | 2020 | Joint entity and relation embedding in GNN | Generalizes earlier knowledge graph methods |
| Query2Box | 2020 | Box embedding for complex queries | Supports first-order logic queries |
TransE, introduced by Antoine Bordes and colleagues at NeurIPS 2013, treats each relation as a translation vector and scores a triple by the distance between the translated head and the tail. DistMult by Bishan Yang and collaborators at ICLR 2015 replaced translation with a diagonal bilinear form. ComplEx by Théo Trouillon and collaborators at ICML 2016 lifted DistMult to complex numbers so antisymmetric relations (parent of, supervises) could be represented. RotatE by Zhiqing Sun and collaborators at ICLR 2019 modeled each relation as a rotation in complex space, which jointly captures symmetry, antisymmetry, inversion, and composition. ConvE applied a small 2D CNN to reshaped entity and relation embeddings and remains a popular baseline due to its parameter efficiency.
R-GCN combines a knowledge graph embedding objective with GNN message passing, treating each relation type as a separate channel. CompGCN by Shikhar Vashishth and collaborators at ICLR 2020 generalized this further by jointly updating entity and relation embeddings through a single GNN. Query embedding methods extend the framework to multi-hop queries: Query2Box by Hongyu Ren and collaborators at ICLR 2020 embeds each query as a box in vector space, with logical operators like intersection and union implemented as box intersection. QueryR2N and other follow-ups expand the operator set to include negation.
Graph machine learning is in production at several large companies. The applications listed below have public technical write-ups or peer-reviewed papers.
The most visible application is protein structure prediction. AlphaFold 2 by DeepMind, published in Nature in July 2021, predicts the 3D structure of a protein from its amino acid sequence by treating the residues and their pairwise relations as a graph and using a custom attention mechanism called the Evoformer over both a multiple sequence alignment and a pair representation. AlphaFold 2 reached experimental accuracy on the CASP14 benchmark and triggered the release of the AlphaFold Protein Structure Database, which now contains predictions for more than 200 million proteins. RoseTTAFold by David Baker's lab at the University of Washington achieved similar results through a three-track architecture. AlphaFold 3, published in May 2024 in a Nature paper, extends the model to predict structures of complexes including DNA, RNA, ligands, and post-translational modifications, using a diffusion module conditioned on a graph encoder.14 Demis Hassabis and John Jumper were awarded a share of the 2024 Nobel Prize in Chemistry for the AlphaFold work, and in November 2024 DeepMind released the AlphaFold 3 inference code under a non-commercial license (CC-BY-NC-SA 4.0), with model weights available to academic researchers on request.15 Independent open reproductions followed quickly, including MIT's Boltz-1 and Boltz-2, Chai-1, Protenix, and HelixFold3, several of which report accuracy comparable to AlphaFold 3 on public benchmarks.16
In small molecule drug discovery, GNNs power molecular property prediction, molecular generation, and docking. Chemprop, a directed MPNN developed at the Massachusetts Institute of Technology, was used by Jonathan Stokes, Kevin Yang, Kyle Swanson, Wengong Jin, and colleagues in the laboratory of James Collins to discover the antibiotic halicin in 2020; the model, trained on chemical libraries to predict antibacterial activity, flagged a molecule structurally distant from known antibiotics that proved active against a broad range of pathogens.17 DiffDock, presented at ICLR 2023, scores ligand poses by a diffusion model over translations, rotations, and torsions, with an equivariant GNN as the score network. GNoME for materials and MatterGen for generative materials design are described above.
PinSAGE by Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William Hamilton, and Jure Leskovec at KDD 2018 was the first GNN deployed at web scale. PinSAGE ran on a bipartite graph of 3 billion pins and 18 billion edges at Pinterest, generating embeddings for related pin recommendations through random walk sampling and importance pooling. The system replaced an existing collaborative filtering pipeline and was reported to improve recommendation quality by double-digit percentages.
UberEats described in 2019 a graph-based system for dish and restaurant recommendation. LinkedIn published several papers on heterogeneous GNNs for member-job matching. Alibaba's M2GRL uses multi-view GNNs over the Taobao product graph.
Google Maps switched its estimated time of arrival model to a GNN in 2020. Austin Derrow-Pinion, Jennifer She, David Wong, Oliver Lange, and collaborators described the system in a paper at CIKM 2021 and in Nature Machine Intelligence. The model treats road segments as edges and intersections as nodes, with per-segment travel time predicted by a GNN that aggregates spatial context. Google reported up to 50 percent reductions in ETA accuracy errors in several cities. Uber and DiDi have published similar systems.
Visa described in 2022 a transactional graph GNN that flags fraud rings by jointly considering accounts, devices, merchants, and IP addresses. PayPal, Stripe, Ant Group, and Tencent have published GNN-based anti-money-laundering pipelines. The advantage is the ability to detect coordinated patterns invisible from any single transaction.
Three major libraries dominate the field. All three implement the MPNN abstraction, ship hundreds of layers and datasets, and integrate with PyTorch, JAX, or TensorFlow.
| Library | Backend | First release | Maintainer | Notable feature |
|---|---|---|---|---|
| PyTorch Geometric (PyG) | PyTorch | 2019 | Matthias Fey, Jan Eric Lenssen (TU Dortmund, Kumo AI) | Largest model zoo, used in most academic papers |
| Deep Graph Library (DGL) | PyTorch, MXNet, TensorFlow | 2018 | Amazon AWS AI, NYU Shanghai, NYU | Distributed multi-machine training |
| jraph | JAX | 2020 | Google DeepMind | Functional, fast on TPU |
| Spektral | Keras, TensorFlow | 2019 | Daniele Grattarola | Easy Keras-style API |
| Stellar Graph | TensorFlow | 2018 | CSIRO Data61 | End-to-end pipelines |
| TensorFlow GNN | TensorFlow | 2021 | Production deployment in Google Cloud | |
| TorchDrug | PyTorch | 2021 | MILA | Drug discovery focus |
PyTorch Geometric (PyG), introduced by Matthias Fey and Jan Eric Lenssen in 2019 at the ICLR Representation Learning on Graphs workshop, is the de facto standard for academic graph learning. PyG implements over 100 layers, ships standard datasets (OGB, TU, PPI, Reddit), and supports heterogeneous graphs, temporal graphs, and explainability tools. The library is built on a sparse tensor backend and integrates with the rest of the PyTorch ecosystem.
Deep Graph Library (DGL), released by AWS AI Lab and NYU in 2018, offers a similar feature set with stronger support for distributed training on multi-machine clusters. DGL exposes a relational message passing API convenient for heterogeneous graphs and supports several backends. The DGL team also maintains DGL-KE for large-scale knowledge graph embedding.
jraph by DeepMind, released in 2020, is a minimal functional library for graph nets in JAX. It is used by the AlphaFold team and other DeepMind researchers. Spektral by Daniele Grattarola is a Keras-based library aimed at fast prototyping. Stellar Graph by CSIRO Data61 offers end-to-end pipelines. TensorFlow GNN (TF-GNN), released by Google in 2021, exposes a heterogeneous graph schema and integrates with TensorFlow Extended.
Graph learning benchmarks have evolved rapidly because early datasets (Cora, CiteSeer, PubMed) saturated quickly and were criticized for being too small and too easy. Several modern benchmark suites address these issues.
| Benchmark | Released | Scope | Notable property |
|---|---|---|---|
| TUDatasets | 2014 to 2020 | 120+ molecule, social, and biological graphs | Standard for graph classification |
| Cora, CiteSeer, PubMed | Pre-2010 | Citation networks | Classic transductive node classification |
| MoleculeNet | 2018 | Quantum, physiology, biophysics tasks | Pioneering chemistry benchmark |
| QM9 | 2014 | 134k small molecules, 12 quantum properties | Workhorse for equivariant networks |
| OGB | 2020 | Node, link, and graph tasks of various scales | Standardized splits, leaderboards |
| OGB-LSC | 2021 | KDD Cup 2021 large-scale tasks | PCQM4M, MAG240M, WikiKG90M |
| LRGB | 2022 | Long Range Graph Benchmark | Tests global mixing capability |
| GNN Benchmark suite | 2020 | Six tasks for fair comparison | Curated by Dwivedi and Bresson |
| Open Catalyst | 2020 | Catalyst materials simulation | 130 million DFT calculations |
| MalNet | 2021 | 1.2M function call graphs | Malware classification |
Open Graph Benchmark (OGB), introduced by Weihua Hu, Matthias Fey, Marinka Zitnik, and Jure Leskovec at NeurIPS 2020, defined standardized splits and metrics across graph sizes from small molecules to 100-million-edge citation graphs. OGB-LSC, released in 2021, scaled up to MAG240M (240 million nodes), WikiKG90M (90 million entities), and PCQM4M (4 million molecules). The OGB Large Scale Challenge at KDD Cup 2021 was won by Microsoft's Graphormer team on PCQM4M.
MoleculeNet by Zhenqin Wu, Bharath Ramsundar, and Vijay Pande in 2018 packaged 17 chemistry datasets covering quantum mechanics, physical chemistry, biophysics, and physiology, with scaffold splits that approximate generalization to new chemical scaffolds. LRGB (Long Range Graph Benchmark) by Dwivedi and colleagues in 2022 selected five datasets where information must propagate across many hops. LRGB is widely used to evaluate graph transformer architectures.
Despite the rapid progress, several fundamental issues constrain GNN performance and have driven much of the research agenda in the last five years.
Over-smoothing. Qimai Li, Zhichao Han, and Xiao-Ming Wu observed in 2018 that stacking many GCN layers causes node representations to converge to indistinguishable vectors, since repeated averaging acts like a low-pass filter on the Laplacian. The practical result is that most GCNs use only two or three layers, which limits the receptive field. Mitigations include residual connections (DeepGCN by Guohao Li and colleagues in 2019), PairNorm normalization, and the use of attention or gating to control diffusion.
Over-squashing. Uri Alon and Eran Yahav showed at ICLR 2021 that the exponential growth of the receptive field across layers combined with a fixed-size node representation forces the network to compress information from many distant nodes into a single vector, losing information about long-range dependencies. They demonstrated that GAT-like architectures suffer especially badly on synthetic tasks that need to combine information across many hops. Cristian Bodnar and colleagues at NeurIPS 2022 connected over-squashing to negative curvature in the underlying graph and proposed structural rewiring to alleviate it.
Expressivity bounds. The Xu et al. analysis at ICLR 2019 proved that any standard message passing GNN is at most as powerful as the 1-Weisfeiler-Lehman test, meaning there exist non-isomorphic graphs that no GCN, GraphSAGE, or GIN can distinguish. Several stronger frameworks have been proposed, including k-GNN (which simulates k-WL at cost exponential in k), Provably Powerful Graph Networks by Maron and colleagues, and identity-aware GNNs (ID-GNN) by You and colleagues at AAAI 2021. Subgraph-based approaches such as ESAN by Bevilacqua and collaborators at ICLR 2022 also push past the 1-WL barrier.
Scalability. Even with sampling, graphs over 100 million nodes remain difficult to train on. Industrial systems at Amazon and Alibaba rely on heavy engineering: custom sampling kernels, multi-GPU partitioning, and offline neighbor precomputation. Distributed training is harder than for images or text because of irregular memory access and neighborhood dependency structure.
Heterophily. Most early benchmarks were homophilous: connected nodes tend to share the same label. On heterophilous graphs (where neighbors usually disagree), basic GCN underperforms a simple MLP that ignores the graph entirely. Several architectures, including H2GCN by Zhu and colleagues at NeurIPS 2020 and GPR-GNN by Chien and colleagues, address heterophily by learning signed or per-hop coefficients.
Robustness. GNNs are sensitive to graph structure perturbations. A handful of strategic edge additions can change a node's predicted class. Daniel Zügner and Stephan Günnemann at KDD 2018 introduced Nettack, the first targeted adversarial attack on GNNs.
Sanchez-Lengeling, B., Reif, E., Pearce, A., and Wiltschko, A. B. "A Gentle Introduction to Graph Neural Networks." Distill, 2021. https://distill.pub/2021/gnn-intro/ Accessed 2026-05-31. ↩
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. "Graph Attention Networks." ICLR 2018. arXiv:1710.10903. https://arxiv.org/abs/1710.10903 Accessed 2026-05-31. ↩
Brody, S., Alon, U., and Yahav, E. "How Attentive are Graph Attention Networks?" ICLR 2022. arXiv:2105.14491. https://arxiv.org/abs/2105.14491 Accessed 2026-05-31. ↩
Xu, K., Hu, W., Leskovec, J., and Jegelka, S. "How Powerful are Graph Neural Networks?" ICLR 2019. arXiv:1810.00826. https://arxiv.org/abs/1810.00826 Accessed 2026-05-31. ↩
Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., Shen, Y., and Liu, T. Y. "Do Transformers Really Perform Badly for Graph Representation?" NeurIPS 2021. arXiv:2106.05234. https://arxiv.org/abs/2106.05234 Accessed 2026-05-31. ↩
Open Graph Benchmark. "OGB-LSC @ KDD Cup 2021." Stanford, 2021. https://ogb.stanford.edu/kddcup2021/ Accessed 2026-05-31. ↩
Kim, J., Nguyen, T. D., Min, S., Cho, S., Lee, M., Lee, H., and Hong, S. "Pure Transformers are Powerful Graph Learners." NeurIPS 2022. arXiv:2207.02505. https://arxiv.org/abs/2207.02505 Accessed 2026-05-31. ↩
Rampášek, L., Galkin, M., Dwivedi, V. P., Luu, A. T., Wolf, G., and Beaini, D. "Recipe for a General, Powerful, Scalable Graph Transformer." NeurIPS 2022. arXiv:2205.12454. https://arxiv.org/abs/2205.12454 Accessed 2026-05-31. ↩
Mao, H., Chen, Z., Tang, W., et al. "Graph Foundation Models: A Comprehensive Survey." arXiv:2505.15116, 2025. https://arxiv.org/abs/2505.15116 Accessed 2026-05-31. ↩
Batatia, I., Kovács, D. P., Simm, G. N. C., Ortner, C., and Csányi, G. "MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields." NeurIPS 2022. arXiv:2206.07697. https://arxiv.org/abs/2206.07697 Accessed 2026-05-31. ↩
Batatia, I., et al. "A Foundation Model for Atomistic Materials Chemistry" (MACE-MP-0). arXiv:2401.00096, 2023. https://arxiv.org/abs/2401.00096 Accessed 2026-05-31. ↩
Merchant, A., Batzner, S., Schoenholz, S. S., Aykol, M., Cheon, G., and Cubuk, E. D. "Scaling Deep Learning for Materials Discovery" (GNoME). Nature, 2023. https://www.nature.com/articles/s41586-023-06735-9 Accessed 2026-05-31. ↩
Zeni, C., Pinsler, R., Zügner, D., et al. "A Generative Model for Inorganic Materials Design" (MatterGen). Nature, January 2025. https://www.nature.com/articles/s41586-025-08628-5 Accessed 2026-05-31. ↩
Abramson, J., Adler, J., Dunger, J., et al. "Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3." Nature, May 2024. https://www.nature.com/articles/s41586-024-07487-w Accessed 2026-05-31. ↩
Callaway, E. "Major AlphaFold Upgrade Offers Boost for Drug Discovery" / "AI Protein-Prediction Tool AlphaFold3 is Now More Open." Nature news, November 2024. https://www.nature.com/articles/d41586-024-03708-4 Accessed 2026-05-31. ↩
Wohlwend, J., et al. "Boltz-1: Democratizing Biomolecular Interaction Modeling." bioRxiv, 2024 (and Boltz-2, Chai-1, Protenix reproductions). https://jclinic.mit.edu/boltz-1/ Accessed 2026-05-31. ↩
Stokes, J. M., Yang, K., Swanson, K., Jin, W., et al. "A Deep Learning Approach to Antibiotic Discovery." Cell, February 2020. https://www.cell.com/cell/fulltext/S0092-8674(20)30102-1 Accessed 2026-05-31. ↩
Lam, R., Sanchez-Gonzalez, A., Willson, M., et al. "Learning Skillful Medium-Range Global Weather Forecasting" (GraphCast). Science, December 2023. https://www.science.org/doi/10.1126/science.adi2336 Accessed 2026-05-31. ↩
Price, I., Sanchez-Gonzalez, A., Alet, F., et al. "Probabilistic Weather Forecasting with Machine Learning" (GenCast). Nature, 2024. arXiv:2312.15796. https://arxiv.org/abs/2312.15796 Accessed 2026-05-31. ↩
Mirhoseini, A., Goldie, A., Yazgan, M., et al. "A Graph Placement Methodology for Fast Chip Design" (named AlphaChip; addendum September 2024). Nature, 2021. https://www.nature.com/articles/s41586-021-03544-w Accessed 2026-05-31. ↩