Eager Execution

Eager execution is an imperative programming paradigm in machine learning frameworks where operations are evaluated immediately as they are called, rather than being added to a deferred computational graph for later execution. In eager mode, each operation returns a concrete value right away, allowing developers to write, test, and debug code using standard programming patterns. This stands in contrast to graph-based (or "define-and-run") execution, where the entire computation must be declared as a static graph before any values are computed.

Eager execution became the default mode of TensorFlow starting with version 2.0 in 2019, and has been the native execution model of PyTorch since its initial release in 2016. The concept is closely associated with the "define-by-run" paradigm, first introduced by the Chainer framework in 2015.

Explain like I'm 5 (ELI5)

Imagine you have a box of LEGO bricks and you want to build a house. With "graph execution," you would have to draw out every single step of the house on paper first, hand the paper to a builder, and then the builder would put it all together at once. You cannot see any part of the house until the whole plan is finished.

With "eager execution," you just start building. You pick up a brick, stick it on, and you can immediately see what it looks like. If something looks wrong, you can fix it right away. It is slower because you are deciding one brick at a time instead of following an optimized plan, but it is much easier to experiment and catch mistakes early.

Historical background

Static graph era

Early deep learning frameworks such as Theano (released 2007) and the original TensorFlow (released November 2015) used a "define-and-run" execution model. In this approach, the programmer first constructs a symbolic computation graph that describes all the mathematical operations and data flow, and then executes the entire graph inside a runtime session. TensorFlow 1.x required users to create a tf.Session object and call Session.run() to evaluate any tensor, making interactive debugging difficult.

While static graphs enabled aggressive compiler optimizations (constant folding, operator fusion, memory planning, and cross-device distribution), the programming model was cumbersome. Developers could not simply insert Python print() statements to inspect intermediate values. Error messages often pointed to graph construction code rather than the actual failing operation. Control flow (loops, conditionals) had to be expressed using special graph primitives (tf.while_loop, tf.cond) rather than native Python if and for statements.

The define-by-run revolution

The first framework to introduce the define-by-run paradigm was Chainer, released in June 2015 by Japanese company Preferred Networks. Instead of building a static graph ahead of time, Chainer recorded the history of computation as operations were executed, building the graph on the fly. This made it possible to use standard Python control flow directly in model definitions.

Several other dynamic-graph frameworks followed:

Framework	Organization	Release	Key contribution
Chainer	Preferred Networks	June 2015	First define-by-run framework
DyNet	Carnegie Mellon University	January 2017	Optimized for NLP tasks with variable-length inputs
PyTorch	Meta AI (Facebook)	January 2017	Combined define-by-run with autograd; became dominant research framework
TensorFlow Eager	Google Brain	October 2017 (preview)	Brought eager execution to TensorFlow as an optional mode
TensorFlow 2.0	Google	September 2019	Made eager execution the default mode

The define-by-run approach gained rapid adoption in the research community because it removed the friction between writing Python code and running neural network computations. PyTorch, in particular, became the preferred framework in academic research partly because of its native eager execution model.

TensorFlow's transition

Google introduced eager execution for TensorFlow in October 2017, announced by Asim Shankar and Wolff Dobson of the Google Brain team. Initially available as an experimental feature in tf.contrib.eager, it became the default execution mode in TensorFlow 2.0 (released September 2019). The academic paper describing the design, "TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning," was published by Agrawal et al. at the SysML Conference in 2019.

This transition was motivated in large part by the growing popularity of PyTorch, which offered a more intuitive development experience. TensorFlow 2.0 simultaneously introduced tf.function as a bridge, allowing developers to write code in eager mode and selectively convert performance-sensitive functions to optimized graphs.

How eager execution works

Immediate operation dispatch

In eager mode, when a program calls a framework operation (for example, tf.matmul(a, b) or torch.matmul(a, b)), the operation is dispatched to the appropriate hardware (CPU, GPU, or TPU) and executed immediately. The result is returned as a concrete tensor value that can be inspected, printed, or passed to the next operation. There is no intermediate graph representation between the user's code and the execution.

This means that standard Python control flow works naturally:

import torch

def dynamic_network(x, threshold=0.5):
    if x.mean() > threshold:
        return torch.relu(x)
    else:
        return torch.sigmoid(x)

In eager mode, the if statement is evaluated using the actual runtime value of x.mean(), so the model's behavior can change from input to input. In a static graph framework, implementing this would require special graph-level conditional operators.

Automatic differentiation with gradient tapes

Eager execution frameworks compute gradients using a tape-based automatic differentiation system (also called "autograd"). During the forward pass, the framework records every operation onto a data structure called a "tape." During the backward pass, it replays the tape in reverse order, applying the chain rule at each step to accumulate gradients.

In TensorFlow 2.x, this recording is explicit through tf.GradientTape:

import tensorflow as tf

x = tf.Variable(3.0)
with tf.GradientTape() as tape:
    y = x ** 2
grad = tape.gradient(y, x)  # Returns 6.0

In PyTorch, the recording is implicit. Any operation on a tensor with requires_grad=True is automatically tracked:

import torch

x = torch.tensor(3.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad)  # Returns 6.0

Both approaches use reverse-mode differentiation, which is well suited to deep learning because models typically have many parameters but produce a single scalar loss value.

Memory management

Eager execution stores all intermediate activations from the forward pass so that gradients can be computed during the backward pass. This can lead to high memory consumption for large models. Techniques such as gradient checkpointing (also called activation checkpointing or rematerialization) address this by discarding some intermediate activations during the forward pass and recomputing them during the backward pass, trading extra computation for reduced memory usage.

Eager execution vs. graph execution

The two execution modes represent a fundamental trade-off between developer productivity and runtime performance.

Key differences

Aspect	Eager execution	Graph execution
When operations run	Immediately, one at a time	After full graph is constructed
Debugging	Standard Python debuggers (pdb, breakpoints, print)	Requires special tools (TensorBoard, tf.print)
Control flow	Native Python if/for/while	Framework-specific ops (tf.cond, tf.while_loop)
Error reporting	Immediate, at the line that fails	Deferred; may point to graph construction code
Performance	Higher per-operation overhead	Optimized through fusion, folding, and parallelism
Memory planning	Operations allocated individually	Global memory planning across entire graph
Portability	Requires Python runtime	Graph can be exported and run without Python
Dynamic models	Naturally supported	Requires special handling

Performance trade-offs

Graph execution can be significantly faster than eager execution for repeated computations. A benchmark from the TensorFlow documentation shows that a matrix power function runs approximately 5x faster under tf.function (graph mode) compared to eager mode when executed 1,000 times. The speedup comes from several sources:

Operator fusion: The graph compiler can merge adjacent operations (for example, a matrix multiply followed by a bias addition) into a single kernel, reducing memory bandwidth usage and kernel launch overhead.
Constant folding: Values that can be determined at compile time are pre-computed.
Common subexpression elimination: Redundant computations are identified and removed.
Parallel scheduling: Independent operations can be dispatched to different devices or streams simultaneously.
Memory optimization: The compiler can plan memory allocation across the entire computation, reusing buffers where possible.

However, for single-execution scenarios (such as running inference once), eager execution can actually be faster because it avoids the upfront cost of graph construction and compilation. The graph execution advantage emerges when the same computation is repeated many times, as is typical during training.

Batch size also affects the relative overhead. Research by Pmusic et al. (2022) found that as batch size increases, the per-item execution time drops dramatically in eager mode, because the fixed overhead of dispatching each operation is amortized over more data items.

Bridging eager and graph execution

Modern frameworks have converged on hybrid approaches that let developers write code in eager mode while selectively compiling performance-sensitive parts to optimized graphs.

tf.function in TensorFlow

The tf.function decorator converts a Python function into a callable TensorFlow graph. When the decorated function is first called, TensorFlow "traces" it by executing the Python code with special tracer objects instead of real tensors. The tracer records all TensorFlow operations into a graph (a ConcreteFunction), which is then compiled and cached.

@tf.function
def train_step(images, labels):
    with tf.GradientTape() as tape:
        predictions = model(images)
        loss = loss_fn(labels, predictions)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

Subsequent calls with matching input signatures reuse the cached graph, avoiding re-tracing. If the input signature changes (for example, a different tensor shape or dtype), TensorFlow traces a new specialized graph. This polymorphic behavior means a single tf.function can hold multiple compiled graphs.

A common pitfall is that Python side effects (such as print() calls or list mutations) only execute during tracing, not during subsequent graph executions. Developers must use tf.print() for output that should appear on every call.

torch.compile in PyTorch

PyTorch 2.0 (released March 2023) introduced torch.compile, which brings graph-based optimizations to PyTorch while preserving the eager-mode development experience. The compilation pipeline has three stages:

Graph acquisition (TorchDynamo): TorchDynamo analyzes Python bytecode at runtime and extracts computation subgraphs as FX graphs. When it encounters unsupported Python constructs (such as data-dependent control flow), it "graph breaks," letting the Python interpreter handle that section and then resuming graph capture.
Graph lowering: PyTorch operations are decomposed into lower-level kernels specific to the compilation backend.
Graph compilation (TorchInductor): The lowered graph is compiled into optimized code. For GPUs, TorchInductor generates Triton kernels; for CPUs, it generates C++ code with OpenMP parallelism.

model = torch.compile(model)

TorchDynamo successfully captures the graph in approximately 99% of cases without requiring code changes. When a graph break does occur, the affected subgraph falls back to eager execution, so correctness is preserved even for complex Python code.

JAX's approach

JAX, developed by Google, takes a different approach. By default, JAX operations execute eagerly (similar to NumPy), but the jax.jit transformation compiles a function using XLA (Accelerated Linear Algebra) for high-performance execution on GPUs and TPUs.

JAX's tracing works by replacing actual values with abstract tracer objects that record the shape and dtype (but not the values) of each operation. This means that Python control flow that depends on runtime values cannot be captured by jax.jit; developers must use JAX-specific primitives (jax.lax.cond, jax.lax.fori_loop) for such cases. JAX's functional programming model, where functions must be pure (no side effects), makes this tracing more predictable than TensorFlow's approach.

MXNet Gluon hybridize

Apache MXNet's Gluon API offered a hybridize() method that converted imperative eager code into a symbolic graph for optimized execution. Users could prototype in eager mode, then call net.hybridize() to switch to graph mode for training and deployment. MXNet's development has largely wound down since 2023, but its hybrid approach influenced the design of other frameworks.

Framework comparison

The following table summarizes how major frameworks handle eager and graph execution:

Framework	Default mode	Graph compilation	Tracing mechanism	Control flow handling
TensorFlow 2.x	Eager	`tf.function`	Python execution with tracer objects	AutoGraph converts Python to graph ops
PyTorch 2.x	Eager	`torch.compile`	TorchDynamo bytecode analysis	Graph break and fallback to eager
JAX	Eager	`jax.jit`	Abstract tracer objects (shape/dtype only)	Requires `jax.lax` primitives
TensorFlow 1.x	Graph	N/A (native graph mode)	`tf.Session`	`tf.cond`, `tf.while_loop`
Chainer	Eager	Not available	N/A	Native Python
DyNet	Eager	Auto-batching optimizer	Dynamic graph per input	Native C++/Python
MXNet Gluon	Eager	`hybridize()`	Symbol-based tracing	Limited after hybridization

Advantages of eager execution

Debugging and development speed

The most frequently cited advantage of eager execution is the ability to use standard debugging tools. Developers can set breakpoints, step through code line by line, and inspect tensor values at any point during execution. In graph mode, tensor values are not available until the graph is run, making interactive debugging impractical.

With eager execution, error messages point directly to the Python line where the problem occurred. In graph mode, errors often appear during session execution and reference internal graph node names, making it difficult to trace them back to the source code.

Dynamic neural network architectures

Some neural network architectures require the computation graph to change from input to input. Examples include:

Recurrent neural networks with variable-length sequences, where the number of unrolled time steps depends on the input length.
Tree-structured models (such as Tree-LSTMs for natural language processing), where the network topology mirrors the parse tree of each input sentence.
Mixture-of-experts models, where different subnetworks are activated depending on the input.
Neural architecture search, where the model structure itself is being optimized during training.

Eager execution handles these cases naturally because the graph is built fresh for each input. Static graph frameworks require workarounds such as bucketing (grouping inputs by length) or padding (extending all inputs to a fixed maximum length).

Python ecosystem integration

Because eager execution returns concrete values at each step, it integrates smoothly with the broader Python ecosystem. Developers can mix framework operations with NumPy, SciPy, matplotlib, and other libraries without special adapters. They can also use Python data structures (lists, dictionaries) and third-party libraries within model code.

Rapid prototyping

Researchers frequently need to experiment with novel architectures and loss functions. Eager execution reduces the iteration cycle because there is no compilation step between writing code and seeing results. This has contributed to PyTorch's dominance in academic research, where speed of experimentation often matters more than raw training throughput.

Limitations and challenges

Performance overhead

Eager execution incurs overhead for each operation because the framework must dispatch the operation to hardware, wait for the result, and return it to Python before proceeding to the next operation. For models with many small operations, this dispatch overhead can dominate the total execution time. Graph execution amortizes this overhead by batching many operations into a single dispatch.

The overhead is most noticeable when operations are individually fast (small matrix multiplies, element-wise operations on small tensors). For large operations (such as multiplying large matrices on a GPU), the kernel execution time dwarfs the dispatch overhead, and the performance difference between eager and graph modes shrinks.

Limited cross-device optimization

In graph mode, the compiler can analyze the entire computation and distribute operations across multiple devices (GPUs, TPUs) in an optimal way. In eager mode, each operation is dispatched to a single device, and the programmer must manually manage data placement and inter-device communication.

Deployment constraints

Graph representations (such as TensorFlow SavedModel or ONNX) can be deployed on edge devices, mobile phones, and embedded systems without a Python runtime. Eager mode code depends on the Python interpreter, making it unsuitable for deployment in resource-constrained environments unless it is first compiled to a graph format.

Memory consumption

As discussed in the memory management section, eager execution must retain intermediate activations throughout the forward pass to support gradient computation. Without the global view that graph mode provides, the runtime cannot plan memory reuse as efficiently. This can limit the maximum model size or batch size that fits in accelerator memory.

Practical usage patterns

Development in eager mode, training with compilation

The most common workflow in both TensorFlow and PyTorch is to develop and debug models in eager mode, then wrap the training loop (or at least the forward/backward pass) in tf.function or torch.compile for production training. This provides the best of both worlds: fast iteration during development and optimized throughput during training.

Selective compilation

Developers can apply graph compilation to specific functions rather than the entire program. For example, a data preprocessing pipeline might remain in eager mode (because it involves complex Python logic), while the model's forward pass and loss computation are compiled.

Export for inference

Both TensorFlow and PyTorch provide tools to export trained models as static graphs for deployment:

TensorFlow: tf.saved_model.save() exports a SavedModel that can be served with TensorFlow Serving or converted to TensorFlow Lite for mobile.
PyTorch: torch.export (introduced in PyTorch 2.1) captures a graph for deployment, replacing the older torch.jit.trace and torch.jit.script methods.
ONNX: Both frameworks can export models to the Open Neural Network Exchange format, which can then be optimized and run with ONNX Runtime.

Profiling and optimization

When optimizing model performance, developers typically start by profiling in eager mode to identify bottlenecks, then apply compilation to the slowest sections. Tools like the PyTorch Profiler and TensorFlow Profiler work in both eager and compiled modes, though the information they provide differs (eager mode shows individual operation timings, while compiled mode shows fused kernel timings).

Code examples

TensorFlow: eager vs. graph execution

import tensorflow as tf
import time

# Eager execution (default in TF 2.x)
def compute_eager(x):
    for _ in range(100):
        x = tf.nn.relu(tf.matmul(x, tf.ones([100, 100])))
    return x

# Graph execution via tf.function
@tf.function
def compute_graph(x):
    for _ in range(100):
        x = tf.nn.relu(tf.matmul(x, tf.ones([100, 100])))
    return x

x = tf.random.normal([100, 100])

# Warm up
compute_eager(x)
compute_graph(x)

# Benchmark
start = time.time()
for _ in range(100):
    compute_eager(x)
eager_time = time.time() - start

start = time.time()
for _ in range(100):
    compute_graph(x)
graph_time = time.time() - start

print(f"Eager: {eager_time:.3f}s, Graph: {graph_time:.3f}s")

PyTorch: eager vs. compiled execution

import torch
import time

def compute(x):
    for _ in range(100):
        x = torch.relu(torch.matmul(x, torch.ones(100, 100, device=x.device)))
    return x

compiled_compute = torch.compile(compute)

x = torch.randn(100, 100)

# Warm up
compute(x)
compiled_compute(x)

# Benchmark
start = time.time()
for _ in range(100):
    compute(x)
eager_time = time.time() - start

start = time.time()
for _ in range(100):
    compiled_compute(x)
compiled_time = time.time() - start

print(f"Eager: {eager_time:.3f}s, Compiled: {compiled_time:.3f}s")

JAX: eager vs. JIT-compiled execution

import jax
import jax.numpy as jnp
import time

def compute(x):
    for _ in range(100):
        x = jax.nn.relu(jnp.dot(x, jnp.ones((100, 100))))
    return x

jit_compute = jax.jit(compute)

key = jax.random.PRNGKey(0)
x = jax.random.normal(key, (100, 100))

# Warm up
compute(x)
jit_compute(x)

# Benchmark
start = time.time()
for _ in range(100):
    compute(x)
eager_time = time.time() - start

start = time.time()
for _ in range(100):
    jit_compute(x)
jit_time = time.time() - start

print(f"Eager: {eager_time:.3f}s, JIT: {jit_time:.3f}s")

Automatic differentiation: The mathematical technique underlying gradient computation in eager execution frameworks.
Computation graph: The data structure that represents mathematical operations and their dependencies.
Backpropagation: The algorithm for computing gradients by propagating errors backward through the network.
TensorFlow: Google's machine learning framework that transitioned from graph-only to eager-by-default execution.
PyTorch: Meta's deep learning framework built around eager execution from the start.
JAX: Google's functional machine learning framework supporting both eager and JIT-compiled execution.
GPU computing: Hardware acceleration commonly used with both eager and graph execution.
TPU: Google's custom accelerator, which benefits from graph-based optimization through XLA.

References

Agrawal, A., Modi, A. N., Passos, A., Lavoie, A., Agarwal, A., Shankar, A., Ganichev, I., Levenberg, J., Hong, M., Monga, R., & Cai, S. (2019). "TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning." *Proceedings of the 2nd SysML Conference*. https://arxiv.org/abs/1903.01855
Shankar, A. & Dobson, W. (2017). "Eager Execution: An imperative, define-by-run interface to TensorFlow." *Google Research Blog*. https://research.google/blog/eager-execution-an-imperative-define-by-run-interface-to-tensorflow/
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). "PyTorch: An Imperative Style, High-Performance Deep Learning Library." *Advances in Neural Information Processing Systems 32*. https://arxiv.org/abs/1912.01703
Tokui, S., Oono, K., Hido, S., & Clayton, J. (2015). "Chainer: a Next-Generation Open Source Framework for Deep Learning." *Workshop on Machine Learning Systems at NIPS 2015*. http://learningsys.org/papers/LearningSys_2015_paper_33.pdf
Neubig, G., Dyer, C., Goldberg, Y., Matthews, A., Ammar, W., Anastasopoulos, A., et al. (2017). "DyNet: The Dynamic Neural Network Toolkit." *arXiv preprint arXiv:1701.03980*. https://arxiv.org/abs/1701.03980
TensorFlow Authors. "Introduction to graphs and tf.function." *TensorFlow Documentation*. https://www.tensorflow.org/guide/intro_to_graphs
TensorFlow Authors. "Better performance with tf.function." *TensorFlow Documentation*. https://www.tensorflow.org/guide/function
TensorFlow Authors. "Introduction to gradients and automatic differentiation." *TensorFlow Documentation*. https://www.tensorflow.org/guide/autodiff
Ansel, J., Yang, E., He, H., Gimelshein, N., Jain, A., Voznesensky, M., et al. (2024). "PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation." *Proceedings of ASPLOS '24*. https://pytorch.org/get-started/pytorch-2-x/
JAX Authors. "Just-in-time compilation." *JAX Documentation*. https://docs.jax.dev/en/latest/jit-compilation.html
Pmusic, A. et al. (2022). "Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem." *IEEE International Symposium on Workload Characterization*. https://hal-lirmm.ccsd.cnrs.fr/lirmm-03775613
ONNX Runtime Authors. "Graph Optimizations in ONNX Runtime." *ONNX Runtime Documentation*. https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html
Tokui, S., Okuta, R., Akiba, T., Niitani, Y., Ogawa, T., Saber, S., Suzuki, M., Uenishi, K., Vogel, B., & Vincent, H. Y. (2019). "Chainer: A Deep Learning Framework for Accelerating the Research Cycle." *Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*. https://arxiv.org/abs/1908.00213

Explain like I'm 5 (ELI5)

Historical background

Static graph era

The define-by-run revolution

TensorFlow's transition

How eager execution works

Immediate operation dispatch

Automatic differentiation with gradient tapes

Memory management

Eager execution vs. graph execution

Key differences

Performance trade-offs

Bridging eager and graph execution

tf.function in TensorFlow

torch.compile in PyTorch

JAX's approach

MXNet Gluon hybridize

Framework comparison

Advantages of eager execution

Debugging and development speed

Dynamic neural network architectures

Python ecosystem integration

Rapid prototyping

Limitations and challenges

Performance overhead

Limited cross-device optimization

Deployment constraints

Memory consumption

Practical usage patterns

Development in eager mode, training with compilation

Selective compilation

Export for inference

Profiling and optimization

Code examples

TensorFlow: eager vs. graph execution

PyTorch: eager vs. compiled execution

JAX: eager vs. JIT-compiled execution

Related concepts

References

Improve this article

Related Articles

Sparse autoencoder

ARC-AGI 2

Graph Execution

GELU (Gaussian Error Linear Unit)

LeNet

Context window

Explain like I'm 5 (ELI5)

Historical background

Static graph era

The define-by-run revolution

TensorFlow's transition

How eager execution works

Immediate operation dispatch

Automatic differentiation with gradient tapes

Memory management

Eager execution vs. graph execution

Key differences

Performance trade-offs

Bridging eager and graph execution

tf.function in TensorFlow

torch.compile in PyTorch

JAX's approach

MXNet Gluon hybridize

Framework comparison

Advantages of eager execution

Debugging and development speed

Dynamic neural network architectures

Python ecosystem integration

Rapid prototyping

Limitations and challenges

Performance overhead

Limited cross-device optimization

Deployment constraints

Memory consumption

Practical usage patterns

Development in eager mode, training with compilation

Selective compilation

Export for inference

Profiling and optimization