# Eager Execution

> Source: https://aiwiki.ai/wiki/eager_execution
> Updated: 2026-06-24
> Categories: Deep Learning, Machine Learning, Software Development
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Eager execution** is an imperative, define-by-run mode of running [machine learning](/wiki/machine_learning) framework operations in which each operation is evaluated immediately as it is called and returns a concrete value, instead of being recorded into a deferred static graph that is run later. Google's TensorFlow team defined it in 2017 as "an imperative, define-by-run interface where operations are executed immediately as they are called from Python." [2] Eager execution lets developers write, test, and debug models with ordinary Python control flow and standard debuggers, which is why it became the dominant style for research; the trade-off is higher per-operation overhead than graph (define-and-run) execution, a gap that modern frameworks close with compilers such as `tf.function`, `torch.compile`, and `jax.jit`.

Eager execution became the default mode of [TensorFlow](/wiki/tensorflow) in version 2.0, released on September 30, 2019 [1][6], and has been the native execution model of [PyTorch](/wiki/pytorch) since its initial public release (created 2016, public release January 2017) [3]. The paradigm is closely associated with "define-by-run," first introduced by the [Chainer](/wiki/chainer) framework on June 9, 2015 [4][13]. It stands in contrast to graph-based execution, where the entire computation must be declared as a static [computational graph](/wiki/computational_graph) before any values are computed.

## Explain like I'm 5 (ELI5)

Imagine you have a box of LEGO bricks and you want to build a house. With "graph execution," you would have to draw out every single step of the house on paper first, hand the paper to a builder, and then the builder would put it all together at once. You cannot see any part of the house until the whole plan is finished.

With "eager execution," you just start building. You pick up a brick, stick it on, and you can immediately see what it looks like. If something looks wrong, you can fix it right away. It is slower because you are deciding one brick at a time instead of following an optimized plan, but it is much easier to experiment and catch mistakes early.

## What is the difference between define-by-run and define-and-run?

The distinction between the two execution models is the single most important idea in this topic. In a **define-and-run** (static graph) framework, the program first constructs a symbolic [computational graph](/wiki/computational_graph) that describes all operations and data flow, and only then runs that graph inside a runtime. In a **define-by-run** (eager) framework, the graph is built implicitly, on the fly, as each operation executes, so the graph and its execution happen together. Chainer's authors framed define-by-run as the approach in which "the connection in a network is not determined when the training is started" but is instead defined dynamically as the code runs [13].

## Historical background

### Static graph era

Early [deep learning](/wiki/deep_learning) frameworks such as Theano (released 2007) and the original TensorFlow (released November 2015) used a "define-and-run" execution model. In this approach, the programmer first constructs a symbolic computation graph that describes all the mathematical operations and data flow, and then executes the entire graph inside a runtime session. TensorFlow 1.x required users to create a `tf.Session` object and call `Session.run()` to evaluate any tensor, making interactive debugging difficult.

While static graphs enabled aggressive compiler optimizations (constant folding, operator fusion, memory planning, and cross-device distribution), the programming model was cumbersome. Developers could not simply insert Python `print()` statements to inspect intermediate values. Error messages often pointed to graph construction code rather than the actual failing operation. Control flow (loops, conditionals) had to be expressed using special graph primitives (`tf.while_loop`, `tf.cond`) rather than native Python `if` and `for` statements.

### The define-by-run revolution

The first framework to introduce the define-by-run paradigm was **Chainer**, released on June 9, 2015 by Japanese company Preferred Networks [4][13]. Instead of building a static graph ahead of time, Chainer recorded the history of computation as operations were executed, building the graph on the fly. This made it possible to use standard Python control flow directly in model definitions. (Preferred Networks moved Chainer into maintenance mode in December 2019 and shifted its own development to PyTorch [13].)

Several other dynamic-graph frameworks followed:

| Framework | Organization | Release | Key contribution |
|---|---|---|---|
| [Chainer](/wiki/chainer) | Preferred Networks | June 2015 | First define-by-run framework |
| DyNet | Carnegie Mellon University | January 2017 | Optimized for NLP tasks with variable-length inputs |
| [PyTorch](/wiki/pytorch) | Meta AI (Facebook) | January 2017 | Combined define-by-run with autograd; became dominant research framework |
| TensorFlow Eager | Google Brain | October 2017 (preview) | Brought eager execution to TensorFlow as an optional mode |
| [TensorFlow](/wiki/tensorflow) 2.0 | Google | September 2019 | Made eager execution the default mode |

The define-by-run approach gained rapid adoption in the research community because it removed the friction between writing Python code and running neural network computations. PyTorch, in particular, became the preferred framework in academic research partly because of its native eager execution model; by the early 2020s a majority of new deep learning research papers reported using PyTorch.

### When did TensorFlow add eager execution?

Google introduced eager execution for TensorFlow on October 31, 2017, in a blog post by Asim Shankar and Wolff Dobson of the Google Brain team [2]. Initially available as an experimental feature in `tf.contrib.eager`, it became the default execution mode in TensorFlow 2.0, released on September 30, 2019 [1][6]. The academic paper describing the design, "TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning," was published by Agrawal et al. at the SysML Conference in 2019 [1]. The Shankar and Dobson announcement listed the headline benefits as "fast debugging with immediate run-time errors and integration with Python tools." [2]

This transition was motivated in large part by the growing popularity of PyTorch, which offered a more intuitive development experience. TensorFlow 2.0 simultaneously introduced `tf.function` as a bridge, allowing developers to write code in eager mode and selectively convert performance-sensitive functions to optimized graphs.

## How does eager execution work?

### Immediate operation dispatch

In eager mode, when a program calls a framework operation (for example, `tf.matmul(a, b)` or `torch.matmul(a, b)`), the operation is dispatched to the appropriate hardware (CPU, [GPU](/wiki/gpu_computing), or [TPU](/wiki/tpu)) and executed immediately. The result is returned as a concrete tensor value that can be inspected, printed, or passed to the next operation. There is no intermediate graph representation between the user's code and the execution.

This means that standard Python control flow works naturally:

```python
import torch

def dynamic_network(x, threshold=0.5):
    if x.mean() > threshold:
        return torch.relu(x)
    else:
        return torch.sigmoid(x)
```

In eager mode, the `if` statement is evaluated using the actual runtime value of `x.mean()`, so the model's behavior can change from input to input. In a static graph framework, implementing this would require special graph-level conditional operators.

### Automatic differentiation with gradient tapes

Eager execution frameworks compute gradients using a tape-based [automatic differentiation](/wiki/automatic_differentiation) system (also called "autograd"). During the forward pass, the framework records every operation onto a data structure called a "tape." During the backward pass, it replays the tape in reverse order, applying the chain rule at each step to accumulate gradients.

In TensorFlow 2.x, this recording is explicit through `tf.GradientTape`:

```python
import tensorflow as tf

x = tf.Variable(3.0)
with tf.GradientTape() as tape:
    y = x ** 2
grad = tape.gradient(y, x)  # Returns 6.0
```

In PyTorch, the recording is implicit. Any operation on a tensor with `requires_grad=True` is automatically tracked:

```python
import torch

x = torch.tensor(3.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad)  # Returns 6.0
```

Both approaches use reverse-mode differentiation, which is well suited to [deep learning](/wiki/deep_learning) because models typically have many parameters but produce a single scalar loss value.

### Memory management

Eager execution stores all intermediate activations from the forward pass so that gradients can be computed during the backward pass. This can lead to high memory consumption for large models. Techniques such as gradient checkpointing (also called activation checkpointing or rematerialization) address this by discarding some intermediate activations during the forward pass and recomputing them during the backward pass, trading extra computation for reduced memory usage.

## How does eager execution differ from graph execution?

The two execution modes represent a fundamental trade-off between developer productivity and runtime performance. The TensorFlow documentation summarizes the graph advantage directly: "Graph execution enables portability outside Python and tends to offer better performance." [6]

### Key differences

| Aspect | Eager execution | Graph execution |
|---|---|---|
| When operations run | Immediately, one at a time | After full graph is constructed |
| Debugging | Standard Python debuggers (pdb, breakpoints, print) | Requires special tools (TensorBoard, tf.print) |
| Control flow | Native Python if/for/while | Framework-specific ops (tf.cond, tf.while_loop) |
| Error reporting | Immediate, at the line that fails | Deferred; may point to graph construction code |
| Performance | Higher per-operation overhead | Optimized through fusion, folding, and parallelism |
| Memory planning | Operations allocated individually | Global memory planning across entire graph |
| Portability | Requires Python runtime | Graph can be exported and run without Python |
| Dynamic models | Naturally supported | Requires special handling |

### How much faster is graph execution?

Graph execution can be significantly faster than eager execution for repeated computations. A benchmark in the official TensorFlow documentation runs a matrix-power function `power(x, 100)` 1,000 times and reports roughly 4.10 seconds in eager mode versus 0.80 seconds under `tf.function` (graph mode), a speedup of about 5.16x [6]. The speedup comes from several sources:

- **Operator fusion**: The graph compiler can merge adjacent operations (for example, a matrix multiply followed by a bias addition) into a single kernel, reducing memory bandwidth usage and kernel launch overhead.
- **Constant folding**: Values that can be determined at compile time are pre-computed.
- **Common subexpression elimination**: Redundant computations are identified and removed.
- **Parallel scheduling**: Independent operations can be dispatched to different devices or streams simultaneously.
- **Memory optimization**: The compiler can plan memory allocation across the entire computation, reusing buffers where possible.

However, for single-execution scenarios (such as running inference once), eager execution can actually be faster because it avoids the upfront cost of graph construction and compilation. The graph execution advantage emerges when the same computation is repeated many times, as is typical during training.

Batch size also affects the relative overhead. Research by Pmusic et al. (2022) found that as batch size increases, the per-item execution time drops dramatically in eager mode, because the fixed overhead of dispatching each operation is amortized over more data items [11].

## How do frameworks bridge eager and graph execution?

Modern frameworks have converged on hybrid approaches that let developers write code in eager mode while selectively compiling performance-sensitive parts to optimized graphs.

### tf.function in TensorFlow

The `tf.function` decorator converts a Python function into a callable TensorFlow graph. When the decorated function is first called, TensorFlow "traces" it by executing the Python code with special tracer objects instead of real tensors. The tracer records all TensorFlow operations into a graph (a `ConcreteFunction`), which is then compiled and cached [7].

```python
@tf.function
def train_step(images, labels):
    with tf.GradientTape() as tape:
        predictions = model(images)
        loss = loss_fn(labels, predictions)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss
```

Subsequent calls with matching input signatures reuse the cached graph, avoiding re-tracing. If the input signature changes (for example, a different tensor shape or dtype), TensorFlow traces a new specialized graph. This polymorphic behavior means a single `tf.function` can hold multiple compiled graphs.

A common pitfall is that Python side effects (such as `print()` calls or list mutations) only execute during tracing, not during subsequent graph executions. Developers must use `tf.print()` for output that should appear on every call.

### torch.compile in PyTorch

PyTorch 2.0, released on March 15, 2023, introduced `torch.compile`, which brings graph-based optimizations to PyTorch while preserving the eager-mode development experience [9]. The PyTorch team described the release as offering "the same eager-mode development and user experience, while fundamentally changing and supercharging how PyTorch operates at compiler level under the hood." [9] The compilation pipeline has three stages:

1. **Graph acquisition (TorchDynamo)**: TorchDynamo analyzes Python bytecode at runtime and extracts computation subgraphs as FX graphs. When it encounters unsupported Python constructs (such as data-dependent control flow), it "graph breaks," letting the Python interpreter handle that section and then resuming graph capture.
2. **Graph lowering**: PyTorch operations are decomposed into lower-level kernels specific to the compilation backend.
3. **Graph compilation (TorchInductor)**: The lowered graph is compiled into optimized code. For GPUs, TorchInductor generates Triton kernels; for CPUs, it generates C++ code with OpenMP parallelism.

```python
model = torch.compile(model)
```

TorchDynamo captures the graph without requiring code changes far more reliably than earlier tools: PyTorch reported that "while TorchScript and others struggled to even acquire the graph 50% of the time, often with a big overhead, TorchDynamo acquired the graph 99% of the time, correctly, safely and with negligible overhead," validated on more than 7,000 GitHub PyTorch projects [9]. Across 163 open-source models on an NVIDIA A100 GPU, `torch.compile` worked 93% of the time and ran 43% faster in training on average [9]. When a graph break does occur, the affected subgraph falls back to eager execution, so correctness is preserved even for complex Python code.

### JAX's approach

[JAX](/wiki/jax), developed by Google, takes a different approach. By default, JAX operations execute eagerly (similar to NumPy), but the `jax.jit` transformation compiles a function using [XLA](/wiki/xla) (Accelerated Linear Algebra) for high-performance execution on GPUs and TPUs [10].

JAX's tracing works by replacing actual values with abstract tracer objects that record the shape and dtype (but not the values) of each operation. This means that Python control flow that depends on runtime values cannot be captured by `jax.jit`; developers must use JAX-specific primitives (`jax.lax.cond`, `jax.lax.fori_loop`) for such cases. JAX's functional programming model, where functions must be pure (no side effects), makes this tracing more predictable than TensorFlow's approach.

### MXNet Gluon hybridize

Apache MXNet's Gluon API offered a `hybridize()` method that converted imperative eager code into a symbolic graph for optimized execution. Users could prototype in eager mode, then call `net.hybridize()` to switch to graph mode for training and deployment. MXNet's development has largely wound down since 2023, but its hybrid approach influenced the design of other frameworks.

## Framework comparison

The following table summarizes how major frameworks handle eager and graph execution:

| Framework | Default mode | Graph compilation | Tracing mechanism | Control flow handling |
|---|---|---|---|---|
| [TensorFlow](/wiki/tensorflow) 2.x | Eager | `tf.function` | Python execution with tracer objects | AutoGraph converts Python to graph ops |
| [PyTorch](/wiki/pytorch) 2.x | Eager | `torch.compile` | TorchDynamo bytecode analysis | Graph break and fallback to eager |
| [JAX](/wiki/jax) | Eager | `jax.jit` | Abstract tracer objects (shape/dtype only) | Requires `jax.lax` primitives |
| TensorFlow 1.x | Graph | N/A (native graph mode) | `tf.Session` | `tf.cond`, `tf.while_loop` |
| [Chainer](/wiki/chainer) | Eager | Not available | N/A | Native Python |
| DyNet | Eager | Auto-batching optimizer | Dynamic graph per input | Native C++/Python |
| MXNet Gluon | Eager | `hybridize()` | Symbol-based tracing | Limited after hybridization |

## What are the advantages of eager execution?

### Debugging and development speed

The most frequently cited advantage of eager execution is the ability to use standard debugging tools. Developers can set breakpoints, step through code line by line, and inspect tensor values at any point during execution. In graph mode, tensor values are not available until the graph is run, making interactive debugging impractical. The TensorFlow team explicitly listed "fast debugging with immediate run-time errors and integration with Python tools" as a primary motivation for adding eager execution [2].

With eager execution, error messages point directly to the Python line where the problem occurred. In graph mode, errors often appear during session execution and reference internal graph node names, making it difficult to trace them back to the source code.

### Dynamic neural network architectures

Some neural network architectures require the computation graph to change from input to input. Examples include:

- **[Recurrent neural networks](/wiki/recurrent_neural_network)** with variable-length sequences, where the number of unrolled time steps depends on the input length.
- **Tree-structured models** (such as Tree-LSTMs for natural language processing), where the network topology mirrors the parse tree of each input sentence.
- **Mixture-of-experts models**, where different subnetworks are activated depending on the input.
- **Neural architecture search**, where the model structure itself is being optimized during training.

Eager execution handles these cases naturally because the graph is built fresh for each input. Static graph frameworks require workarounds such as bucketing (grouping inputs by length) or padding (extending all inputs to a fixed maximum length).

### Python ecosystem integration

Because eager execution returns concrete values at each step, it integrates smoothly with the broader Python ecosystem. Developers can mix framework operations with NumPy, SciPy, matplotlib, and other libraries without special adapters. They can also use Python data structures (lists, dictionaries) and third-party libraries within model code.

### Rapid prototyping

Researchers frequently need to experiment with novel architectures and loss functions. Eager execution reduces the iteration cycle because there is no compilation step between writing code and seeing results. This has contributed to PyTorch's dominance in academic research, where speed of experimentation often matters more than raw training throughput.

## What are the limitations of eager execution?

### Performance overhead

Eager execution incurs overhead for each operation because the framework must dispatch the operation to hardware, wait for the result, and return it to Python before proceeding to the next operation. For models with many small operations, this dispatch overhead can dominate the total execution time. Graph execution amortizes this overhead by batching many operations into a single dispatch; the TensorFlow documentation's own benchmark shows graph mode running about 5x faster on a repeated matrix-power computation [6].

The overhead is most noticeable when operations are individually fast (small matrix multiplies, element-wise operations on small tensors). For large operations (such as multiplying large matrices on a GPU), the kernel execution time dwarfs the dispatch overhead, and the performance difference between eager and graph modes shrinks.

### Limited cross-device optimization

In graph mode, the compiler can analyze the entire computation and distribute operations across multiple devices (GPUs, TPUs) in an optimal way. In eager mode, each operation is dispatched to a single device, and the programmer must manually manage data placement and inter-device communication.

### Deployment constraints

Graph representations (such as TensorFlow SavedModel or ONNX) can be deployed on edge devices, mobile phones, and embedded systems without a Python runtime. The TensorFlow documentation notes that "graph execution enables portability outside Python." [6] Eager mode code depends on the Python interpreter, making it unsuitable for deployment in resource-constrained environments unless it is first compiled to a graph format.

### Memory consumption

As discussed in the memory management section, eager execution must retain intermediate activations throughout the forward pass to support gradient computation. Without the global view that graph mode provides, the runtime cannot plan memory reuse as efficiently. This can limit the maximum model size or batch size that fits in accelerator memory.

## Practical usage patterns

### Development in eager mode, training with compilation

The most common workflow in both TensorFlow and PyTorch is to develop and debug models in eager mode, then wrap the training loop (or at least the forward/backward pass) in `tf.function` or `torch.compile` for production training. This provides the best of both worlds: fast iteration during development and optimized throughput during training.

### Selective compilation

Developers can apply graph compilation to specific functions rather than the entire program. For example, a data preprocessing pipeline might remain in eager mode (because it involves complex Python logic), while the model's forward pass and loss computation are compiled.

### Export for inference

Both TensorFlow and PyTorch provide tools to export trained models as static graphs for deployment:

- **TensorFlow**: `tf.saved_model.save()` exports a SavedModel that can be served with TensorFlow Serving or converted to TensorFlow Lite for mobile.
- **PyTorch**: `torch.export` (introduced in PyTorch 2.1) captures a graph for deployment, replacing the older `torch.jit.trace` and `torch.jit.script` methods.
- **ONNX**: Both frameworks can export models to the Open [Neural Network](/wiki/neural_network) Exchange format, which can then be optimized and run with ONNX Runtime [12].

### Profiling and optimization

When optimizing model performance, developers typically start by profiling in eager mode to identify bottlenecks, then apply compilation to the slowest sections. Tools like the PyTorch Profiler and TensorFlow Profiler work in both eager and compiled modes, though the information they provide differs (eager mode shows individual operation timings, while compiled mode shows fused kernel timings).

## Code examples

### TensorFlow: eager vs. graph execution

```python
import tensorflow as tf
import time

# Eager execution (default in TF 2.x)
def compute_eager(x):
    for _ in range(100):
        x = tf.nn.relu(tf.matmul(x, tf.ones([100, 100])))
    return x

# Graph execution via tf.function
@tf.function
def compute_graph(x):
    for _ in range(100):
        x = tf.nn.relu(tf.matmul(x, tf.ones([100, 100])))
    return x

x = tf.random.normal([100, 100])

# Warm up
compute_eager(x)
compute_graph(x)

# Benchmark
start = time.time()
for _ in range(100):
    compute_eager(x)
eager_time = time.time() - start

start = time.time()
for _ in range(100):
    compute_graph(x)
graph_time = time.time() - start

print(f"Eager: {eager_time:.3f}s, Graph: {graph_time:.3f}s")
```

### PyTorch: eager vs. compiled execution

```python
import torch
import time

def compute(x):
    for _ in range(100):
        x = torch.relu(torch.matmul(x, torch.ones(100, 100, device=x.device)))
    return x

compiled_compute = torch.compile(compute)

x = torch.randn(100, 100)

# Warm up
compute(x)
compiled_compute(x)

# Benchmark
start = time.time()
for _ in range(100):
    compute(x)
eager_time = time.time() - start

start = time.time()
for _ in range(100):
    compiled_compute(x)
compiled_time = time.time() - start

print(f"Eager: {eager_time:.3f}s, Compiled: {compiled_time:.3f}s")
```

### JAX: eager vs. JIT-compiled execution

```python
import jax
import jax.numpy as jnp
import time

def compute(x):
    for _ in range(100):
        x = jax.nn.relu(jnp.dot(x, jnp.ones((100, 100))))
    return x

jit_compute = jax.jit(compute)

key = jax.random.PRNGKey(0)
x = jax.random.normal(key, (100, 100))

# Warm up
compute(x)
jit_compute(x)

# Benchmark
start = time.time()
for _ in range(100):
    compute(x)
eager_time = time.time() - start

start = time.time()
for _ in range(100):
    jit_compute(x)
jit_time = time.time() - start

print(f"Eager: {eager_time:.3f}s, JIT: {jit_time:.3f}s")
```

## Related concepts

- [Automatic differentiation](/wiki/automatic_differentiation): The mathematical technique underlying gradient computation in eager execution frameworks.
- [Computational graph](/wiki/computational_graph): The data structure that represents mathematical operations and their dependencies.
- [Backpropagation](/wiki/backpropagation): The algorithm for computing gradients by propagating errors backward through the network.
- [TensorFlow](/wiki/tensorflow): Google's machine learning framework that transitioned from graph-only to eager-by-default execution.
- [PyTorch](/wiki/pytorch): Meta's deep learning framework built around eager execution from the start.
- [JAX](/wiki/jax): Google's functional machine learning framework supporting both eager and JIT-compiled execution.
- [GPU computing](/wiki/gpu_computing): Hardware acceleration commonly used with both eager and graph execution.
- [TPU](/wiki/tpu): Google's custom accelerator, which benefits from graph-based optimization through XLA.

## References

1. Agrawal, A., Modi, A. N., Passos, A., Lavoie, A., Agarwal, A., Shankar, A., Ganichev, I., Levenberg, J., Hong, M., Monga, R., & Cai, S. (2019). "TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning." *Proceedings of the 2nd SysML Conference*. https://arxiv.org/abs/1903.01855

2. Shankar, A. & Dobson, W. (2017). "Eager Execution: An imperative, define-by-run interface to TensorFlow." *Google Developers Blog*, October 31, 2017. https://developers.googleblog.com/eager-execution-an-imperative-define-by-run-interface-to-tensorflow/

3. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). "PyTorch: An Imperative Style, High-Performance Deep Learning Library." *Advances in Neural Information Processing Systems 32*. https://arxiv.org/abs/1912.01703

4. Tokui, S., Oono, K., Hido, S., & Clayton, J. (2015). "Chainer: a Next-Generation Open Source Framework for Deep Learning." *Workshop on Machine Learning Systems at NIPS 2015*. http://learningsys.org/papers/LearningSys_2015_paper_33.pdf

5. Neubig, G., Dyer, C., Goldberg, Y., Matthews, A., Ammar, W., Anastasopoulos, A., et al. (2017). "DyNet: The Dynamic Neural Network Toolkit." *arXiv preprint arXiv:1701.03980*. https://arxiv.org/abs/1701.03980

6. TensorFlow Authors. "Introduction to graphs and tf.function." *TensorFlow Documentation*. https://www.tensorflow.org/guide/intro_to_graphs

7. TensorFlow Authors. "Better performance with tf.function." *TensorFlow Documentation*. https://www.tensorflow.org/guide/function

8. TensorFlow Authors. "Introduction to gradients and automatic differentiation." *TensorFlow Documentation*. https://www.tensorflow.org/guide/autodiff

9. Ansel, J., Yang, E., He, H., Gimelshein, N., Jain, A., Voznesensky, M., et al. (2024). "PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation." *Proceedings of ASPLOS '24*; see also "PyTorch 2.0: Our next generation release," PyTorch Blog, March 15, 2023. https://pytorch.org/get-started/pytorch-2-x/

10. JAX Authors. "Just-in-time compilation." *JAX Documentation*. https://docs.jax.dev/en/latest/jit-compilation.html

11. Pmusic, A. et al. (2022). "Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem." *IEEE International Symposium on Workload Characterization*. https://hal-lirmm.ccsd.cnrs.fr/lirmm-03775613

12. ONNX Runtime Authors. "Graph Optimizations in ONNX Runtime." *ONNX Runtime Documentation*. https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html

13. Tokui, S., Okuta, R., Akiba, T., Niitani, Y., Ogawa, T., Saber, S., Suzuki, M., Uenishi, K., Vogel, B., & Vincent, H. Y. (2019). "Chainer: A Deep Learning Framework for Accelerating the Research Cycle." *Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*. https://arxiv.org/abs/1908.00213

