# Offline

> Source: https://aiwiki.ai/wiki/offline
> Updated: 2026-06-27
> Categories: Machine Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

In machine learning, **offline** describes operations that happen ahead of time on a fixed dataset rather than continuously on live data. Google's [Machine Learning Glossary](https://developers.google.com/machine-learning/glossary) lists "offline" as a direct synonym for [static](/wiki/static), and treats both as the counterpart to [online](/wiki/online) (also called [dynamic](/wiki/dynamic)).[1][2] The term shows up in offline [training](/wiki/training), offline [inference](/wiki/inference), and offline [reinforcement learning](/wiki/reinforcement_learning). All three share the same idea: do the heavy work once on a known batch of data, then serve the results, instead of reacting to new inputs as they arrive.

## What does offline mean in machine learning?

Google's developer documentation is the most cited reference for this terminology, and its definition is short. The glossary entry for "offline" simply reads: synonym for static.[1] It then defines offline inference and static inference as the same operation under two names, and offline training and static training as the same operation under two names.[2] "Batch" is sometimes added as a third synonym, especially in the inference setting where "batch inference" is the dominant phrase in industry blogs and cloud documentation.[7]

The practical implication is that an article about "static inference," "offline inference," or "batch inference" describes roughly the same system: a job that runs over a chunk of inputs on a schedule and writes the predictions somewhere for later retrieval. The counterparts, "online," "dynamic," and "real-time," are also used interchangeably, and they describe systems that compute a prediction in response to a single request, usually in milliseconds.

## Offline training

Offline training, sometimes called static training or batch training, is the standard textbook picture of how a model gets built. You collect a dataset, freeze it, run the optimization loop until the loss stops moving, evaluate on a held out set, and then ship the trained weights. From that moment until you decide to retrain, the model never sees a new training example.

### How it works

A typical offline training pipeline collects raw data, cleans and labels it, splits it into training and evaluation sets, fits the model with an optimizer such as stochastic gradient descent, and produces a snapshot of weights. The snapshot is then handed to a serving system, usually a separate process. The model itself does not change while it serves requests. To update it, you run the pipeline again on a daily, weekly, or quarterly cadence and replace the served weights with the new snapshot.

### Why teams choose it

Most production ML systems use offline training because it is easier to reason about. The training run is reproducible, evaluation metrics are comparable across versions, and you can roll back to an earlier snapshot if the new one regresses. Google's production ML course puts it bluntly: static training is simpler to develop, test, and maintain than dynamic training.[3] It also requires less monitoring of the training job itself, because the job either finishes or it does not.

Offline training is also usually cheaper. You spin up a cluster, run the job, and tear it down, rather than keeping a streaming pipeline alive at all times and engineering the training loop to be robust to bursts of bad data arriving in real time.

### Where it falls short

The main weakness is staleness. A model trained offline only knows about the data that existed when the snapshot was taken. If user behavior, product mix, or the underlying distribution changes after that, the model has no way to notice. Google's documentation illustrates the problem with a flower purchase model trained only on July and August data: it performs well in late summer but predicts badly around Valentine's Day, when buying patterns change sharply.[3] This kind of failure is what [concept drift](/wiki/concept_drift) and [data drift](/wiki/data_drift) usually refer to in production ML.

The usual mitigation is to retrain on a schedule and to monitor the input distribution at inference time so that you notice drift before it shows up as user complaints. Static training does not mean training once forever; it just means you batch retraining into discrete jobs rather than letting the model learn continuously.

## Offline inference

Offline inference, also called static inference or batch inference, is the prediction-time counterpart to offline training. Instead of computing a prediction at the moment a user asks for it, the system precomputes predictions for many inputs in advance, stores them in a database or cache, and serves them by lookup.

### How it works

A batch inference job typically runs on a schedule, for example every night. It pulls a list of items or users to score, runs the model over all of them, and writes the results into a key value store, a feature store, or a database table.[7] The serving application then retrieves the cached prediction when it needs one. Because the lookup is a simple database read, response times track the latency of the storage layer, not the model.[4]

Netflix-style nightly recommendation refreshes are a familiar example. A large model can spend hours generating a fresh recommendation list for every active user, and the front end serves any one of those lists in milliseconds. Lead scoring, churn prediction, and credit risk scoring for known customers follow the same pattern.[7]

### Advantages

The Google ML crash course summarizes the upside as low inference cost, the ability to verify predictions before serving them, and near instant response time to end users.[4] Because the predictions are sitting in storage, you can audit them, sanity check distributions, run business rules over them, and even discard or override a batch before it goes live. You can also use a much heavier model than would be feasible for real time serving, since latency at request time depends only on the cache, not the model.

### Disadvantages

The tradeoff is freshness and coverage. A batch system can only serve predictions for inputs it has already seen and scored. If a brand new user arrives, the cache has no prediction for them. If a known user's situation changes between batch runs, the cached prediction is stale.[4] Update latency is usually measured in hours or days, which is fine for nightly product recommendations but unacceptable for fraud detection or pricing decisions that have to react within a transaction.[7]

This is why batch inference is often paired with a simpler fallback model that runs online for unknown inputs, or with a hybrid system where the heavy model runs offline and a lightweight model handles the long tail in real time.

## How does offline differ from online?

The choice between offline and online operation is usually decided by three questions: how quickly does the input distribution change, how fresh do predictions need to be, and how much engineering complexity can the team afford.

| Dimension | Offline (static) | Online (dynamic) |
| --- | --- | --- |
| Training data | Fixed snapshot | Streamed or frequently refreshed |
| Update cadence | Hours to weeks | Continuous |
| Inference latency | Cache read time | Model execution time |
| Engineering effort | Lower | Higher |
| Handling of new inputs | Limited to scored items | Can score on demand |
| Sensitivity to drift | Higher | Lower if retraining is frequent |
| Cost profile | Periodic batch compute | Steady streaming compute |

In practice many production systems are hybrids. A recommender might train offline weekly, precompute candidate lists nightly, and run a small online model to rerank a handful of candidates per request. A search engine might cache offline embeddings for documents and compute query embeddings online. The real question is rarely offline or online; it is which parts of the pipeline live on which side of the line.

## Offline evaluation

A related use of the word is **offline evaluation**, evaluating a model on a held out dataset before deployment, as opposed to **online evaluation** through A/B tests on live traffic. Offline evaluation is cheap and fast and is how most candidate models are filtered, but it is not always predictive of live performance, because the evaluation set is itself a snapshot. Serious production teams run both: offline metrics to decide what to ship, online metrics to decide whether shipping it actually helped.

## What is offline reinforcement learning?

Offline reinforcement learning is a subfield of [reinforcement learning](/wiki/reinforcement_learning) in which an agent learns a policy purely from a fixed, previously collected dataset of logged transitions, with no further interaction with the environment during training. It is also called **batch RL**. Because the meaning of "offline" here is technically narrower than the static-versus-dynamic sense used elsewhere on this page, it deserves its own treatment. In standard reinforcement learning, an agent learns by interacting with an environment, taking actions, and observing rewards; offline RL removes that interaction entirely and must extract the best policy it can from the static data alone.[5]

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu, in their May 2020 tutorial and review on offline reinforcement learning, define the setting as "reinforcement learning algorithms that utilize previously collected data, without additional online data collection."[5] The Berkeley AI Research blog uses similar language: learning skills solely from previously collected datasets, without any active environment interaction.[6]

### Why does offline reinforcement learning matter?

The motivation is that environment interaction is expensive or dangerous in many real domains. You cannot let a learning agent freely experiment with treatment plans for hospital patients, driving policies for self driving cars, or trading strategies in live markets, but you may have years of logged decisions and outcomes from those domains.[6] Offline RL aims to convert that logged data into a useful policy without ever rolling out an unsafe one. The Levine survey argues that this data-driven framing could do for sequential decision making what large static datasets did for supervised [deep learning](/wiki/deep_learning): make it possible to learn from broad, reusable data rather than collecting fresh experience for every new task.[5]

### What is the core difficulty in offline reinforcement learning?

Offline RL has a specific failure mode known as **distributional shift**, sometimes described as **extrapolation error**. A standard off-policy RL algorithm can assign a high value to actions that were never tried in the dataset, because nothing in the data contradicts that estimate. Sebastien Fujimoto, David Meger, and Doina Precup, who introduced this framing in their 2019 ICML paper, describe extrapolation error as error "introduced by the mismatch between the dataset and true state-action visitation of the current policy."[9] When the learned policy then acts on those overestimated actions, the results can be much worse than the data would suggest. The Levine tutorial frames distributional shift as the central challenge of the subfield, and most modern offline RL methods are designed specifically to keep the learned policy from drifting too far from the logged behavior.[5][6]

### What algorithms are used for offline reinforcement learning?

The field converged on a handful of approaches, most of which constrain the policy toward the data distribution or make value estimates deliberately pessimistic. The table below lists the most influential methods, all benchmarked on shared datasets such as D4RL (Datasets for Deep Data-Driven Reinforcement Learning), the standard offline RL benchmark introduced by Fu et al. in 2020.[8]

| Method | Authors, year | Core idea |
| --- | --- | --- |
| BCQ (Batch-Constrained deep Q-learning) | Fujimoto, Meger, Precup, 2019 | First deep batch RL method; constrains actions to those likely under a learned generative model of the data, addressing extrapolation error.[9] |
| CQL (Conservative Q-Learning) | Kumar, Zhou, Tucker, Levine, 2020 | Learns a conservative Q-function that lower-bounds true value by adding a regularizer to the Bellman error.[10] |
| Decision Transformer | Chen et al., 2021 | Casts offline RL as conditional sequence modeling with a causally masked Transformer, conditioning on desired return.[11] |
| IQL (Implicit Q-Learning) | Kostrikov, Nair, Levine, 2021 | Avoids querying out-of-dataset actions by approximating an upper expectile of the value distribution; extracts the policy via advantage-weighted regression.[12] |

Conservative Q-Learning is among the most widely used. Its authors report that "CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return," especially on complex, multi-modal data.[10] The Decision Transformer, by contrast, sidesteps value functions altogether: its authors note that, "unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer," yet still matches or exceeds state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.[11] This connection to sequence models links offline RL to the broader [Transformer](/wiki/transformer) and [imitation learning](/wiki/imitation_learning) literature.

### How does offline RL differ from offline supervised learning?

Offline supervised learning predicts labels from features. Offline RL trains a policy to take actions that maximize cumulative reward, using a sequence of state, action, reward, next state tuples generated by some earlier behavior policy. The goal is to produce a new policy that is better than the one that generated the data. That counterfactual element, asking what would have happened if the agent had acted differently, is what makes offline RL harder than ordinary batch supervised learning. It also distinguishes offline RL from plain [imitation learning](/wiki/imitation_learning), which only tries to copy the behavior in the data rather than improve on it.

## Common pitfalls

A few patterns come up repeatedly in offline ML systems and are worth flagging.

**Online and offline skew.** If features are computed differently at training time and at serving time, the model can degrade silently even when the data distribution has not changed. Production teams often invest in a [feature store](/wiki/feature_store) to compute features the same way in both paths.[13]

**Stale labels.** Offline training assumes the snapshot labels reflect the world the model will encounter at serving time. When ground truth arrives with a long delay, for example a churn label confirmed 90 days after the prediction, the training set can lag the production distribution by months.

**Cache invalidation.** Batch inference works only as long as someone refreshes the cache. A common incident is a forgotten batch job that has been failing for weeks, with the application happily serving increasingly stale predictions.

**Confusing offline RL with offline supervised learning.** Despite the shared word, the two settings have different theory and different failure modes.[5] Treating an offline RL problem as a regression task on logged actions usually leaves a lot of performance on the table.

## Explain like I'm 5

Imagine you are baking cookies for a school bake sale. You can bake them all the night before and put them in a box, ready to hand out. That is offline: do the work once, store the results, and serve them later. Or you can wait at the table and bake each cookie fresh whenever someone walks up. That is online: do the work on demand, every time. The first way is cheaper and faster to serve, but you cannot make a flavor for a kid you did not expect. The second way handles surprises but is much more work to keep going. Machine learning systems make the same choice when they decide whether to train and predict ahead of time or on the fly.

## References

1. Google. [Machine Learning Glossary](https://developers.google.com/machine-learning/glossary). Google for Developers.
2. Google. [Machine Learning Glossary: ML Fundamentals](https://developers.google.com/machine-learning/glossary/fundamentals). Google for Developers.
3. Google. [Production ML systems: Static versus dynamic training](https://developers.google.com/machine-learning/crash-course/production-ml-systems/static-vs-dynamic-training). Machine Learning Crash Course.
4. Google. [Production ML systems: Static versus dynamic inference](https://developers.google.com/machine-learning/crash-course/production-ml-systems/static-vs-dynamic-inference). Machine Learning Crash Course.
5. Levine, S., Kumar, A., Tucker, G., and Fu, J. (2020). [Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems](https://arxiv.org/abs/2005.01643). arXiv:2005.01643.
6. Kumar, A. and Levine, S. (2020). [Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications](https://bair.berkeley.edu/blog/2020/12/07/offline/). Berkeley Artificial Intelligence Research Blog.
7. Wagenmaker, L. (2019). [Batch Inference vs Online Inference](https://mlinproduction.com/batch-inference-vs-online-inference/). ML in Production.
8. Fu, J., Kumar, A., Nachum, O., Tucker, G., and Levine, S. (2020). [D4RL: Datasets for Deep Data-Driven Reinforcement Learning](https://arxiv.org/abs/2004.07219). arXiv:2004.07219.
9. Fujimoto, S., Meger, D., and Precup, D. (2019). [Off-Policy Deep Reinforcement Learning without Exploration](https://arxiv.org/abs/1812.02900). Proceedings of the 36th International Conference on Machine Learning (ICML). arXiv:1812.02900.
10. Kumar, A., Zhou, A., Tucker, G., and Levine, S. (2020). [Conservative Q-Learning for Offline Reinforcement Learning](https://arxiv.org/abs/2006.04779). Advances in Neural Information Processing Systems 33 (NeurIPS). arXiv:2006.04779.
11. Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., Abbeel, P., Srinivas, A., and Mordatch, I. (2021). [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345). Advances in Neural Information Processing Systems 34 (NeurIPS). arXiv:2106.01345.
12. Kostrikov, I., Nair, A., and Levine, S. (2021). [Offline Reinforcement Learning with Implicit Q-Learning](https://arxiv.org/abs/2110.06169). arXiv:2110.06169.
13. Tecton. [Reducing Online/Offline Skew for Reliable Machine Learning Predictions](https://www.tecton.ai/blog/reducing-online-offline-skew-for-reliable-machine-learning-predictions/).

