Eager execution is an imperative programming paradigm in machine learning frameworks where operations are evaluated immediately as they are called, rather than being added to a deferred computational graph for later execution. In eager mode, each operation returns a concrete value right away, allowing developers to write, test, and debug code using standard programming patterns. This stands in contrast to graph-based (or "define-and-run") execution, where the entire computation must be declared as a static graph before any values are computed.
Eager execution became the default mode of TensorFlow starting with version 2.0 in 2019, and has been the native execution model of PyTorch since its initial release in 2016. The concept is closely associated with the "define-by-run" paradigm, first introduced by the Chainer framework in 2015.
Imagine you have a box of LEGO bricks and you want to build a house. With "graph execution," you would have to draw out every single step of the house on paper first, hand the paper to a builder, and then the builder would put it all together at once. You cannot see any part of the house until the whole plan is finished.
With "eager execution," you just start building. You pick up a brick, stick it on, and you can immediately see what it looks like. If something looks wrong, you can fix it right away. It is slower because you are deciding one brick at a time instead of following an optimized plan, but it is much easier to experiment and catch mistakes early.
Early deep learning frameworks such as Theano (released 2007) and the original TensorFlow (released November 2015) used a "define-and-run" execution model. In this approach, the programmer first constructs a symbolic computation graph that describes all the mathematical operations and data flow, and then executes the entire graph inside a runtime session. TensorFlow 1.x required users to create a tf.Session object and call Session.run() to evaluate any tensor, making interactive debugging difficult.
While static graphs enabled aggressive compiler optimizations (constant folding, operator fusion, memory planning, and cross-device distribution), the programming model was cumbersome. Developers could not simply insert Python print() statements to inspect intermediate values. Error messages often pointed to graph construction code rather than the actual failing operation. Control flow (loops, conditionals) had to be expressed using special graph primitives (tf.while_loop, tf.cond) rather than native Python if and for statements.
The first framework to introduce the define-by-run paradigm was Chainer, released in June 2015 by Japanese company Preferred Networks. Instead of building a static graph ahead of time, Chainer recorded the history of computation as operations were executed, building the graph on the fly. This made it possible to use standard Python control flow directly in model definitions.
Several other dynamic-graph frameworks followed:
| Framework | Organization | Release | Key contribution |
|---|---|---|---|
| Chainer | Preferred Networks | June 2015 | First define-by-run framework |
| DyNet | Carnegie Mellon University | January 2017 | Optimized for NLP tasks with variable-length inputs |
| PyTorch | Meta AI (Facebook) | January 2017 | Combined define-by-run with autograd; became dominant research framework |
| TensorFlow Eager | Google Brain | October 2017 (preview) | Brought eager execution to TensorFlow as an optional mode |
| TensorFlow 2.0 | September 2019 | Made eager execution the default mode |
The define-by-run approach gained rapid adoption in the research community because it removed the friction between writing Python code and running neural network computations. PyTorch, in particular, became the preferred framework in academic research partly because of its native eager execution model.
Google introduced eager execution for TensorFlow in October 2017, announced by Asim Shankar and Wolff Dobson of the Google Brain team. Initially available as an experimental feature in tf.contrib.eager, it became the default execution mode in TensorFlow 2.0 (released September 2019). The academic paper describing the design, "TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning," was published by Agrawal et al. at the SysML Conference in 2019.
This transition was motivated in large part by the growing popularity of PyTorch, which offered a more intuitive development experience. TensorFlow 2.0 simultaneously introduced tf.function as a bridge, allowing developers to write code in eager mode and selectively convert performance-sensitive functions to optimized graphs.
In eager mode, when a program calls a framework operation (for example, tf.matmul(a, b) or torch.matmul(a, b)), the operation is dispatched to the appropriate hardware (CPU, GPU, or TPU) and executed immediately. The result is returned as a concrete tensor value that can be inspected, printed, or passed to the next operation. There is no intermediate graph representation between the user's code and the execution.
This means that standard Python control flow works naturally:
import torch
def dynamic_network(x, threshold=0.5):
if x.mean() > threshold:
return torch.relu(x)
else:
return torch.sigmoid(x)
In eager mode, the if statement is evaluated using the actual runtime value of x.mean(), so the model's behavior can change from input to input. In a static graph framework, implementing this would require special graph-level conditional operators.
Eager execution frameworks compute gradients using a tape-based automatic differentiation system (also called "autograd"). During the forward pass, the framework records every operation onto a data structure called a "tape." During the backward pass, it replays the tape in reverse order, applying the chain rule at each step to accumulate gradients.
In TensorFlow 2.x, this recording is explicit through tf.GradientTape:
import tensorflow as tf
x = tf.Variable(3.0)
with tf.GradientTape() as tape:
y = x ** 2
grad = tape.gradient(y, x) # Returns 6.0
In PyTorch, the recording is implicit. Any operation on a tensor with requires_grad=True is automatically tracked:
import torch
x = torch.tensor(3.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad) # Returns 6.0
Both approaches use reverse-mode differentiation, which is well suited to deep learning because models typically have many parameters but produce a single scalar loss value.
Eager execution stores all intermediate activations from the forward pass so that gradients can be computed during the backward pass. This can lead to high memory consumption for large models. Techniques such as gradient checkpointing (also called activation checkpointing or rematerialization) address this by discarding some intermediate activations during the forward pass and recomputing them during the backward pass, trading extra computation for reduced memory usage.
The two execution modes represent a fundamental trade-off between developer productivity and runtime performance.
| Aspect | Eager execution | Graph execution |
|---|---|---|
| When operations run | Immediately, one at a time | After full graph is constructed |
| Debugging | Standard Python debuggers (pdb, breakpoints, print) | Requires special tools (TensorBoard, tf.print) |
| Control flow | Native Python if/for/while | Framework-specific ops (tf.cond, tf.while_loop) |
| Error reporting | Immediate, at the line that fails | Deferred; may point to graph construction code |
| Performance | Higher per-operation overhead | Optimized through fusion, folding, and parallelism |
| Memory planning | Operations allocated individually | Global memory planning across entire graph |
| Portability | Requires Python runtime | Graph can be exported and run without Python |
| Dynamic models | Naturally supported | Requires special handling |
Graph execution can be significantly faster than eager execution for repeated computations. A benchmark from the TensorFlow documentation shows that a matrix power function runs approximately 5x faster under tf.function (graph mode) compared to eager mode when executed 1,000 times. The speedup comes from several sources:
However, for single-execution scenarios (such as running inference once), eager execution can actually be faster because it avoids the upfront cost of graph construction and compilation. The graph execution advantage emerges when the same computation is repeated many times, as is typical during training.
Batch size also affects the relative overhead. Research by Pmusic et al. (2022) found that as batch size increases, the per-item execution time drops dramatically in eager mode, because the fixed overhead of dispatching each operation is amortized over more data items.
Modern frameworks have converged on hybrid approaches that let developers write code in eager mode while selectively compiling performance-sensitive parts to optimized graphs.
The tf.function decorator converts a Python function into a callable TensorFlow graph. When the decorated function is first called, TensorFlow "traces" it by executing the Python code with special tracer objects instead of real tensors. The tracer records all TensorFlow operations into a graph (a ConcreteFunction), which is then compiled and cached.
@tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images)
loss = loss_fn(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return loss
Subsequent calls with matching input signatures reuse the cached graph, avoiding re-tracing. If the input signature changes (for example, a different tensor shape or dtype), TensorFlow traces a new specialized graph. This polymorphic behavior means a single tf.function can hold multiple compiled graphs.
A common pitfall is that Python side effects (such as print() calls or list mutations) only execute during tracing, not during subsequent graph executions. Developers must use tf.print() for output that should appear on every call.
PyTorch 2.0 (released March 2023) introduced torch.compile, which brings graph-based optimizations to PyTorch while preserving the eager-mode development experience. The compilation pipeline has three stages:
model = torch.compile(model)
TorchDynamo successfully captures the graph in approximately 99% of cases without requiring code changes. When a graph break does occur, the affected subgraph falls back to eager execution, so correctness is preserved even for complex Python code.
JAX, developed by Google, takes a different approach. By default, JAX operations execute eagerly (similar to NumPy), but the jax.jit transformation compiles a function using XLA (Accelerated Linear Algebra) for high-performance execution on GPUs and TPUs.
JAX's tracing works by replacing actual values with abstract tracer objects that record the shape and dtype (but not the values) of each operation. This means that Python control flow that depends on runtime values cannot be captured by jax.jit; developers must use JAX-specific primitives (jax.lax.cond, jax.lax.fori_loop) for such cases. JAX's functional programming model, where functions must be pure (no side effects), makes this tracing more predictable than TensorFlow's approach.
Apache MXNet's Gluon API offered a hybridize() method that converted imperative eager code into a symbolic graph for optimized execution. Users could prototype in eager mode, then call net.hybridize() to switch to graph mode for training and deployment. MXNet's development has largely wound down since 2023, but its hybrid approach influenced the design of other frameworks.
The following table summarizes how major frameworks handle eager and graph execution:
| Framework | Default mode | Graph compilation | Tracing mechanism | Control flow handling |
|---|---|---|---|---|
| TensorFlow 2.x | Eager | tf.function | Python execution with tracer objects | AutoGraph converts Python to graph ops |
| PyTorch 2.x | Eager | torch.compile | TorchDynamo bytecode analysis | Graph break and fallback to eager |
| JAX | Eager | jax.jit | Abstract tracer objects (shape/dtype only) | Requires jax.lax primitives |
| TensorFlow 1.x | Graph | N/A (native graph mode) | tf.Session | tf.cond, tf.while_loop |
| Chainer | Eager | Not available | N/A | Native Python |
| DyNet | Eager | Auto-batching optimizer | Dynamic graph per input | Native C++/Python |
| MXNet Gluon | Eager | hybridize() | Symbol-based tracing | Limited after hybridization |
The most frequently cited advantage of eager execution is the ability to use standard debugging tools. Developers can set breakpoints, step through code line by line, and inspect tensor values at any point during execution. In graph mode, tensor values are not available until the graph is run, making interactive debugging impractical.
With eager execution, error messages point directly to the Python line where the problem occurred. In graph mode, errors often appear during session execution and reference internal graph node names, making it difficult to trace them back to the source code.
Some neural network architectures require the computation graph to change from input to input. Examples include:
Eager execution handles these cases naturally because the graph is built fresh for each input. Static graph frameworks require workarounds such as bucketing (grouping inputs by length) or padding (extending all inputs to a fixed maximum length).
Because eager execution returns concrete values at each step, it integrates smoothly with the broader Python ecosystem. Developers can mix framework operations with NumPy, SciPy, matplotlib, and other libraries without special adapters. They can also use Python data structures (lists, dictionaries) and third-party libraries within model code.
Researchers frequently need to experiment with novel architectures and loss functions. Eager execution reduces the iteration cycle because there is no compilation step between writing code and seeing results. This has contributed to PyTorch's dominance in academic research, where speed of experimentation often matters more than raw training throughput.
Eager execution incurs overhead for each operation because the framework must dispatch the operation to hardware, wait for the result, and return it to Python before proceeding to the next operation. For models with many small operations, this dispatch overhead can dominate the total execution time. Graph execution amortizes this overhead by batching many operations into a single dispatch.
The overhead is most noticeable when operations are individually fast (small matrix multiplies, element-wise operations on small tensors). For large operations (such as multiplying large matrices on a GPU), the kernel execution time dwarfs the dispatch overhead, and the performance difference between eager and graph modes shrinks.
In graph mode, the compiler can analyze the entire computation and distribute operations across multiple devices (GPUs, TPUs) in an optimal way. In eager mode, each operation is dispatched to a single device, and the programmer must manually manage data placement and inter-device communication.
Graph representations (such as TensorFlow SavedModel or ONNX) can be deployed on edge devices, mobile phones, and embedded systems without a Python runtime. Eager mode code depends on the Python interpreter, making it unsuitable for deployment in resource-constrained environments unless it is first compiled to a graph format.
As discussed in the memory management section, eager execution must retain intermediate activations throughout the forward pass to support gradient computation. Without the global view that graph mode provides, the runtime cannot plan memory reuse as efficiently. This can limit the maximum model size or batch size that fits in accelerator memory.
The most common workflow in both TensorFlow and PyTorch is to develop and debug models in eager mode, then wrap the training loop (or at least the forward/backward pass) in tf.function or torch.compile for production training. This provides the best of both worlds: fast iteration during development and optimized throughput during training.
Developers can apply graph compilation to specific functions rather than the entire program. For example, a data preprocessing pipeline might remain in eager mode (because it involves complex Python logic), while the model's forward pass and loss computation are compiled.
Both TensorFlow and PyTorch provide tools to export trained models as static graphs for deployment:
tf.saved_model.save() exports a SavedModel that can be served with TensorFlow Serving or converted to TensorFlow Lite for mobile.torch.export (introduced in PyTorch 2.1) captures a graph for deployment, replacing the older torch.jit.trace and torch.jit.script methods.When optimizing model performance, developers typically start by profiling in eager mode to identify bottlenecks, then apply compilation to the slowest sections. Tools like the PyTorch Profiler and TensorFlow Profiler work in both eager and compiled modes, though the information they provide differs (eager mode shows individual operation timings, while compiled mode shows fused kernel timings).
import tensorflow as tf
import time
# Eager execution (default in TF 2.x)
def compute_eager(x):
for _ in range(100):
x = tf.nn.relu(tf.matmul(x, tf.ones([100, 100])))
return x
# Graph execution via tf.function
@tf.function
def compute_graph(x):
for _ in range(100):
x = tf.nn.relu(tf.matmul(x, tf.ones([100, 100])))
return x
x = tf.random.normal([100, 100])
# Warm up
compute_eager(x)
compute_graph(x)
# Benchmark
start = time.time()
for _ in range(100):
compute_eager(x)
eager_time = time.time() - start
start = time.time()
for _ in range(100):
compute_graph(x)
graph_time = time.time() - start
print(f"Eager: {eager_time:.3f}s, Graph: {graph_time:.3f}s")
import torch
import time
def compute(x):
for _ in range(100):
x = torch.relu(torch.matmul(x, torch.ones(100, 100, device=x.device)))
return x
compiled_compute = torch.compile(compute)
x = torch.randn(100, 100)
# Warm up
compute(x)
compiled_compute(x)
# Benchmark
start = time.time()
for _ in range(100):
compute(x)
eager_time = time.time() - start
start = time.time()
for _ in range(100):
compiled_compute(x)
compiled_time = time.time() - start
print(f"Eager: {eager_time:.3f}s, Compiled: {compiled_time:.3f}s")
import jax
import jax.numpy as jnp
import time
def compute(x):
for _ in range(100):
x = jax.nn.relu(jnp.dot(x, jnp.ones((100, 100))))
return x
jit_compute = jax.jit(compute)
key = jax.random.PRNGKey(0)
x = jax.random.normal(key, (100, 100))
# Warm up
compute(x)
jit_compute(x)
# Benchmark
start = time.time()
for _ in range(100):
compute(x)
eager_time = time.time() - start
start = time.time()
for _ in range(100):
jit_compute(x)
jit_time = time.time() - start
print(f"Eager: {eager_time:.3f}s, JIT: {jit_time:.3f}s")