# Graph

> Source: https://aiwiki.ai/wiki/graph
> Updated: 2026-06-27
> Categories: Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

In machine learning the word **graph** has two unrelated meanings. The first is a **graph as a data structure**: a set of nodes (vertices) connected by edges, written formally as *G = (V, E)*, used to represent relational data such as social networks, molecules, knowledge bases, and citation networks, and processed by [graph neural networks](/wiki/graph_neural_network) (GNNs) [1][7]. The second is a **computation graph**: a directed acyclic record of the operations a deep learning framework performs on tensors, which makes [automatic differentiation](/wiki/automatic_differentiation) and [backpropagation](/wiki/backpropagation) practical at scale [2][3]. The two senses share almost nothing beyond the vocabulary of "nodes" and "edges." A computation graph is an internal bookkeeping object inside a framework like [TensorFlow](/wiki/tensorflow), [PyTorch](/wiki/pytorch), or [JAX](/wiki/jax); a graph in the data-structure sense is the *input* to a model [5].

## Introduction

In machine learning the word **graph** carries two distinct meanings. The first is the **graph as a data structure**: a set of nodes connected by edges, used to represent relational data such as social networks, molecules, and citation networks, and processed by [graph neural networks](/wiki/graph_neural_network) (GNNs). The second is the **computation graph**: a directed acyclic record of the operations that a deep learning framework performs on tensors, which makes [automatic differentiation](/wiki/automatic_differentiation) and [backpropagation](/wiki/backpropagation) practical at scale. The two senses share almost nothing beyond the name. A computation graph is an internal bookkeeping object inside a framework like [TensorFlow](/wiki/tensorflow), [PyTorch](/wiki/pytorch), or [JAX](/wiki/jax); a graph in the data-structure sense is the input to a model.

## What is a graph as a data structure?

A graph is formally defined as a pair *G = (V, E)*, where *V* is a set of nodes (vertices) and *E* is a set of edges (links) connecting pairs of nodes. This definition dates to Leonhard Euler's 1736 paper on the Seven Bridges of Königsberg, the result widely regarded as the first theorem of graph theory: Euler reduced the city to a graph of 4 vertices (the two riverbanks and two islands) and 7 edges (the bridges) and proved that no walk could cross every bridge exactly once [8]. The definition carries over almost untouched into machine learning. What machine learning adds is **attributes**: each node, each edge, and sometimes the graph as a whole can carry a feature vector. A protein-protein interaction graph, for instance, might attach an amino acid sequence to every node and a binding affinity score to every edge.

### What is the difference between directed and undirected graphs?

In a **directed graph** the edges have a direction, indicating an asymmetric relationship between the connected nodes. The follower-followee relation on a platform such as X (formerly Twitter) is the canonical example. In an **undirected graph** edges have no direction; a Facebook friendship is the textbook case. Many models, including the original [graph convolutional network](/wiki/graph_convolutional_network), assume the graph is undirected, though directed and heterogeneous variants exist [9].

### What are weighted and unweighted graphs?

In a **weighted graph** every edge carries a scalar weight representing the strength, distance, or significance of the connection. Road networks use weights for travel time; recommendation graphs use them for interaction counts. In an **unweighted graph** every edge is treated as equal. Many real datasets sit between these extremes: edges either exist or do not, but each carries a richer feature vector instead of a single scalar.

### What other graph distinctions matter?

A graph is **homogeneous** when all nodes are the same type and all edges represent the same relation, and **heterogeneous** otherwise. A knowledge graph such as Wikidata is heterogeneous. A graph is **static** if its structure does not change over time, and **dynamic** if nodes or edges appear and disappear, as in trading networks or temporal interaction logs. The table below summarizes the main axes.

| Axis | Type A | Type B | Example |
|---|---|---|---|
| Direction | Directed | Undirected | Follower graph vs. friendship graph |
| Edge weight | Weighted | Unweighted | Road network vs. plain link graph |
| Node/edge types | Heterogeneous | Homogeneous | Knowledge graph vs. citation network |
| Time | Dynamic | Static | Trading network vs. fixed molecule |

## What is a graph neural network?

Graph neural networks are a family of models that consume graph-structured data directly, without flattening it into a fixed-size vector first. The Distill article *A Gentle Introduction to Graph Neural Networks* defines one concisely: "A GNN is an optimizable transformation on all attributes of the graph (nodes, edges, global-context) that preserves graph symmetries (permutation invariances)." [15] The unifying idea is **message passing**, formalized as Message Passing Neural Networks (MPNN) by Justin Gilmer and colleagues at Google Brain in their 2017 paper *Neural Message Passing for Quantum Chemistry* [10]. In an MPNN each node sends a message to its neighbors, collects the messages it receives, then updates its hidden state using an aggregation function followed by a learned update function. Stacking *k* such layers lets information flow across paths of length *k* in the graph.

A core property of message passing is **permutation equivariance**: relabeling the nodes does not change the output, because aggregation uses a permutation-invariant operator such as sum, mean, or max [15]. That property is what makes GNNs sensible models for graphs, which have no canonical node order.

### What is a graph convolutional network (GCN)?

The graph convolutional network was introduced by Thomas Kipf and Max Welling in their 2017 ICLR paper *Semi-Supervised Classification with Graph Convolutional Networks* [9]. The authors motivate the architecture "via a localized first-order approximation of spectral graph convolutions," and report a model that "scales linearly in the number of graph edges" [9]. Its real appeal is how simple the update rule is. Each node averages the features of its neighbors (plus itself), multiplies by a learned weight matrix, and applies a nonlinearity. The normalization uses the symmetrically normalized adjacency matrix, which keeps eigenvalues bounded. GCNs work well on citation networks, where two layers are often enough.

### What is a graph attention network (GAT)?

Graph attention networks, introduced by Petar Veličković and collaborators in their 2018 ICLR paper, replace the fixed normalization weights in a GCN with learned attention coefficients [11]. The paper presents "novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations." [11] For every directed edge from node *u* to node *v* the model computes a score, normalizes the scores across *v*'s neighbors via a softmax, and uses those scores to weight the messages. GATs typically use **multi-head attention**, like the [Transformer](/wiki/transformer) and its underlying [attention mechanism](/wiki/attention_mechanism). A GCN can be viewed as a degenerate GAT with attention scores fixed by the normalized adjacency matrix.

### What are GraphSAGE and other GNN variants?

GraphSAGE (Hamilton, Ying, and Leskovec, NeurIPS 2017) was the first widely adopted **inductive** GNN: instead of memorizing embeddings for a fixed set of nodes, it learns aggregator functions that generalize to unseen nodes [12]. The name abbreviates "SAmple and aggreGatE." That made GNNs practical for very large graphs such as Pinterest's PinSage system, which learns embeddings for nodes numbering in the billions [12][17]. Other variants include the Gated Graph Sequence Neural Network (GGS-NN, 2015) and the Relational GCN (R-GCN) for heterogeneous knowledge graphs.

### What were shallow graph embeddings before GNNs?

Before GNNs took over, **shallow embedding** methods dominated the space. DeepWalk (Perozzi et al., KDD 2014) generates random walks on the graph and feeds them into the skip-gram objective from word2vec, treating each walk as a sentence and each node as a word [13]. Node2Vec (Grover and Leskovec, KDD 2016) generalizes DeepWalk with two parameters *p* and *q* that bias the walks toward either breadth-first or depth-first exploration [14]. These methods are still useful when node features are unavailable, and they sometimes initialize GNNs.

### What are the expressivity limits of GNNs?

A theoretical result that shaped GNN research is that standard message passing GNNs are **at most as powerful as the 1-Weisfeiler-Leman (1-WL) graph isomorphism test**, proven by Xu et al. (Graph Isomorphism Network, ICLR 2019) and Morris et al. (AAAI 2019) [7]. The Xu et al. paper develops "a simple architecture that is provably the most expressive among the class of GNNs and is as powerful as the Weisfeiler-Lehman graph isomorphism test" [7]. Pairs of non-isomorphic graphs exist that no standard GNN can distinguish. Higher-order GNNs and subgraph-based models have been proposed to push past this ceiling.

### How are graphs and GNNs used in AI?

GNN-based methods have produced concrete wins across several domains. [AlphaFold](/wiki/alphafold) 2 (DeepMind, 2020) uses graph-based attention modules (its Evoformer stack) over residue pairs in its protein structure prediction pipeline; at CASP14 it reached a median GDT score of 92.4, a level of accuracy comparable to experimental methods and the system that broke open the benchmark [16]. In drug discovery, MPNNs predict molecular properties from atom-and-bond graphs [10]. In recommendation, PinSage at Pinterest uses GNNs over user-item bipartite graphs to build a web-scale [recommendation system](/wiki/recommendation_system) [17]. Sanchez-Gonzalez et al. (DeepMind, 2020) showed that learned GNN simulators can match classical solvers on fluid and rigid-body dynamics. Graphs also appear in fraud detection, anomaly detection on system provenance logs, and combinatorial optimization heuristics. The same node-and-edge structure underlies the [knowledge graph](/wiki/knowledge_graph), where heterogeneous nodes and typed edges encode facts for search and question answering.

The main software libraries are **PyTorch Geometric (PyG)**, built on PyTorch, and **Deep Graph Library (DGL)**, developed by AWS Shanghai and NYU Shanghai, which supports both PyTorch and TensorFlow [18][19]. Both provide message passing primitives, benchmark datasets, and reference implementations of GCN, GAT, GraphSAGE, and dozens of other architectures.

## What is a computation graph?

A computation graph (also called a **computational graph** or **dataflow graph**) is a directed acyclic graph whose nodes are operations and whose edges carry tensors between them [1][2]. Frameworks use this representation for two reasons. First, it makes [automatic differentiation](/wiki/automatic_differentiation) mechanical: once the forward graph is known, the reverse pass that computes gradients is another walk over the same structure, applying the chain rule at each node [6]. Second, the graph is a compilation target; an optimizing compiler can fuse kernels, fold constants, and dispatch operations to GPUs or TPUs with much less Python overhead [1]. Reverse-mode autodiff, what most frameworks implement, was published by Seppo Linnainmaa in 1976; backpropagation is the special case where the differentiated function is a neural network loss [6].

### What are static graphs (TensorFlow 1.x)?

The first wave of deep learning frameworks used **static graphs**, sometimes called **define-and-run**. In [TensorFlow](/wiki/tensorflow) 1.x the programmer first builds the entire graph in Python using placeholders for inputs and variables for trainable parameters, then opens a `tf.Session` and calls `sess.run` to feed concrete values through the graph. The graph is frozen once constructed; changing the model means rebuilding it.

Static graphs give the runtime a complete picture of the program ahead of time, which enables global optimizations: constant folding, common subexpression elimination, operator fusion, and memory planning [1]. They also serialize cleanly, which is why TensorFlow's SavedModel format has been a durable deployment target. The cost is that the Python code that defines the graph is not the code that runs, which makes debugging painful and data-dependent control flow awkward. Theano, which arguably started the modern era of deep learning around 2008, used a similar model.

### What are dynamic graphs (PyTorch, eager TensorFlow)?

[PyTorch](/wiki/pytorch), released by Facebook AI Research in 2017, popularized the opposite philosophy, called **define-by-run** or **eager execution**. The graph is not declared in advance; it is built implicitly as you run Python code [2]. Every operation on a [tensor](/wiki/tensor) that has `requires_grad=True` creates a node in the graph and records the function needed to compute its gradient. When you call `.backward()` on a scalar loss, PyTorch walks that recorded graph in reverse, accumulates gradients into the `.grad` attributes of the leaves, and then discards the graph [2].

The leaves are the input tensors that require gradients (typically the model parameters). The roots are the tensors on which you call `.backward()`. Each non-leaf tensor carries a `grad_fn` attribute that points to the operation that produced it, and these pointers form the DAG that autograd traverses [2]. Because the graph is rebuilt every iteration, the model can branch, loop, or use Python conditionals that depend on tensor values, and the differentiation still works. Recurrent networks with variable-length inputs and reinforcement learning policies benefit from this flexibility.

The tradeoff is that the runtime sees only one operator at a time, which limits compiler optimizations. Chainer, released by Preferred Networks in 2015, was the first widely used define-by-run framework and directly inspired PyTorch.

### How does TensorFlow 2 use tf.function?

TensorFlow 2.0 (2019) made eager execution the default and exposed the graph machinery through the `tf.function` decorator. Wrapping a Python function in `@tf.function` triggers **tracing**: TensorFlow runs the function once with symbolic placeholders, records the operations, and produces a `ConcreteFunction` backed by a `tf.Graph`. Subsequent calls with the same input signature reuse the cached graph; calls with new shapes or dtypes trigger a fresh trace [1].

Tracing uses **AutoGraph**, a source-level transformer that rewrites Python control flow into graph operations. A Python `if` on a tensor becomes a `tf.cond`; a tensor-conditioned `while` becomes a `tf.while_loop`. This lets users write ordinary-looking Python and still get a static graph [1]. Setting `jit_compile=True` hands the result to **XLA** (Accelerated Linear Algebra), the [XLA](/wiki/xla) compiler that fuses operations and often produces significant speedups on TPUs and GPUs.

### How does JAX use functional transformations?

[JAX](/wiki/jax), released by Google in 2018, exposes graph construction and compilation as user-facing function transformations [5]. The core transformations are:

| Transformation | Purpose |
|---|---|
| `jax.grad` | Returns a function that computes the gradient via reverse-mode autodiff. |
| `jax.jit` | Traces the function into JAX's IR (jaxpr) and compiles it with XLA. |
| `jax.vmap` | Vectorizes the function over a batch axis without writing an explicit loop. |
| `jax.pmap` | Parallelizes the function across multiple devices in a single-program-multiple-data style. |

These transformations compose. A common pattern is `jax.pmap(jax.jit(jax.grad(loss_fn)))`, a per-device JIT-compiled gradient function parallelized across a TPU pod [5]. JAX's tracing model requires the traced function to be a **pure function** of its inputs: as the documentation states, "all the input data is passed through the function parameters, all the results are output through the function results" [20]. That is a real constraint, but in exchange the user gets predictable graph capture and strong compiler optimizations.

### What is PyTorch 2 and torch.compile?

PyTorch 2.0, released in early March 2023, closed much of the gap between eager and graph execution with `torch.compile` [3]. The implementation uses **TorchDynamo**, a JIT compiler that hooks into the CPython frame evaluation API (PEP 523) and rewrites Python bytecode at runtime to extract sequences of PyTorch operations into an FX graph [3][4]. **AOTAutograd** captures the backward pass ahead of time, **PrimTorch** decomposes operations into a smaller primitive set, and **TorchInductor** lowers the result to fast device kernels, often via Triton on GPUs [3]. The PyTorch team validated TorchDynamo on a set of 7,000+ GitHub projects written in PyTorch and reported: "While TorchScript and others struggled to even acquire the graph 50% of the time, often with a big overhead, TorchDynamo acquired the graph 99% of the time, correctly, safely and with negligible overhead." [3] It falls back to eager mode on code it cannot translate. The engineering direction is clear: dynamic frontends with graph-mode backends.

### What is the difference between forward and reverse mode autodiff?

There are two ways to apply the chain rule over a computation graph. **Forward mode** propagates derivatives alongside values, from inputs to outputs, and costs *n* passes for *n* inputs. **Reverse mode** does one forward pass to record values, then walks backward through the graph applying the chain rule to one output at a time, and costs *m* passes for *m* outputs [6]. Deep learning losses are scalar (*m* = 1) with millions of parameters (*n* very large), so reverse mode is dramatically cheaper. That asymmetry is the entire reason every major deep learning framework defaults to reverse-mode autodiff, which is the formal name for backpropagation [6].

## How do the two kinds of graph differ?

The computation graph inside PyTorch and the graph that you feed to a GNN are unrelated objects despite the shared vocabulary of nodes and edges. The computation graph is always a DAG because it records a deterministic forward pass; an input graph in the data-structure sense may be cyclic, undirected, or both [1][7]. A graph neural network has a computation graph too since it runs inside a framework, but that internal graph is distinct from the data-structure graph it consumes.

## ELI5

Think of a graph as dots joined by lines. The dots are "nodes" and the lines are "edges." People in a friend network are dots, friendships are lines; atoms in a molecule are dots, chemical bonds are lines. A graph neural network is a program that learns by letting each dot "whisper" what it knows to the dots it is connected to, over and over, until every dot understands its neighborhood. The other kind of graph, a computation graph, is just a recipe a computer keeps of every math step it took, so it can rewind those steps to figure out how to improve. Same word, two very different ideas.

## See also

- [Graph neural network](/wiki/graph_neural_network)
- [Knowledge graph](/wiki/knowledge_graph)
- [Neural network](/wiki/neural_network)
- [Machine learning](/wiki/machine_learning)
- [Automatic differentiation](/wiki/automatic_differentiation)
- [Backpropagation](/wiki/backpropagation)
- [PyTorch](/wiki/pytorch)
- [TensorFlow](/wiki/tensorflow)
- [JAX](/wiki/jax)

## References

1. TensorFlow. "Introduction to graphs and tf.function." TensorFlow Core Guide. https://www.tensorflow.org/guide/intro_to_graphs
2. PyTorch. "How Computational Graphs are Executed in PyTorch." PyTorch Blog. https://pytorch.org/blog/how-computational-graphs-are-executed-in-pytorch/
3. PyTorch. "PyTorch 2.x." https://pytorch.org/get-started/pytorch-2-x/
4. PyTorch. "Dynamo Overview." https://docs.pytorch.org/docs/stable/torch.compiler_dynamo_overview.html
5. JAX. "Key concepts." JAX documentation. https://docs.jax.dev/en/latest/key-concepts.html
6. Wikipedia. "Automatic differentiation." https://en.wikipedia.org/wiki/Automatic_differentiation
7. Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2019). "How Powerful are Graph Neural Networks?" ICLR 2019. https://arxiv.org/abs/1810.00826
8. Wikipedia. "Seven Bridges of Königsberg." https://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg
9. Kipf, T. N., and Welling, M. (2017). "Semi-Supervised Classification with Graph Convolutional Networks." ICLR 2017. https://arxiv.org/abs/1609.02907
10. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017). "Neural Message Passing for Quantum Chemistry." ICML 2017. https://arxiv.org/abs/1704.01212
11. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. (2018). "Graph Attention Networks." ICLR 2018. https://arxiv.org/abs/1710.10903
12. Hamilton, W., Ying, R., and Leskovec, J. (2017). "Inductive Representation Learning on Large Graphs." NeurIPS 2017. https://arxiv.org/abs/1706.02216
13. Perozzi, B., Al-Rfou, R., and Skiena, S. (2014). "DeepWalk: Online Learning of Social Representations." KDD 2014. https://arxiv.org/abs/1403.6652
14. Grover, A., and Leskovec, J. (2016). "node2vec: Scalable Feature Learning for Networks." KDD 2016. https://arxiv.org/abs/1607.00653
15. Sanchez-Lengeling, B., Reif, E., Pearce, A., and Wiltschko, A. B. (2021). "A Gentle Introduction to Graph Neural Networks." Distill. https://distill.pub/2021/gnn-intro/
16. Jumper, J., et al. (2021). "Highly accurate protein structure prediction with AlphaFold." Nature; DeepMind, "AlphaFold: a solution to a 50-year-old grand challenge in biology" (CASP14, median GDT 92.4). https://en.wikipedia.org/wiki/AlphaFold
17. Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. (2018). "Graph Convolutional Neural Networks for Web-Scale Recommender Systems (PinSage)." KDD 2018. https://arxiv.org/abs/1806.01973
18. PyTorch Geometric documentation. https://pytorch-geometric.readthedocs.io/
19. Deep Graph Library. https://www.dgl.ai/
20. JAX. "JAX - The Sharp Bits (pure functions)." JAX documentation. https://docs.jax.dev/en/latest/notebooks/Common_Gotchas_in_JAX.html