Offline
Last reviewed
May 11, 2026
Sources
8 citations
Review status
Source-backed
Revision
v2 · 2,194 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 11, 2026
Sources
8 citations
Review status
Source-backed
Revision
v2 · 2,194 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Machine learning terms
In machine learning, offline describes operations that happen ahead of time on a fixed dataset rather than continuously on live data. Google's Machine Learning Glossary lists "offline" as a direct synonym for static, and treats both as the counterpart to online (also called dynamic). The term shows up in offline training, offline inference, and offline reinforcement learning. All three share the same idea: do the heavy work once on a known batch of data, then serve the results, instead of reacting to new inputs as they arrive.
Google's developer documentation is the most cited reference for this terminology, and its definition is short. The glossary entry for "offline" simply reads: synonym for static. It then defines offline inference and static inference as the same operation under two names, and offline training and static training as the same operation under two names. "Batch" is sometimes added as a third synonym, especially in the inference setting where "batch inference" is the dominant phrase in industry blogs and cloud documentation.
The practical implication is that an article about "static inference," "offline inference," or "batch inference" describes roughly the same system: a job that runs over a chunk of inputs on a schedule and writes the predictions somewhere for later retrieval. The counterparts, "online," "dynamic," and "real-time," are also used interchangeably, and they describe systems that compute a prediction in response to a single request, usually in milliseconds.
Offline training, sometimes called static training or batch training, is the standard textbook picture of how a model gets built. You collect a dataset, freeze it, run the optimization loop until the loss stops moving, evaluate on a held out set, and then ship the trained weights. From that moment until you decide to retrain, the model never sees a new training example.
A typical offline training pipeline collects raw data, cleans and labels it, splits it into training and evaluation sets, fits the model with an optimizer such as stochastic gradient descent, and produces a snapshot of weights. The snapshot is then handed to a serving system, usually a separate process. The model itself does not change while it serves requests. To update it, you run the pipeline again on a daily, weekly, or quarterly cadence and replace the served weights with the new snapshot.
Most production ML systems use offline training because it is easier to reason about. The training run is reproducible, evaluation metrics are comparable across versions, and you can roll back to an earlier snapshot if the new one regresses. Google's production ML course puts it bluntly: static training is simpler to develop, test, and maintain than dynamic training. It also requires less monitoring of the training job itself, because the job either finishes or it does not.
Offline training is also usually cheaper. You spin up a cluster, run the job, and tear it down, rather than keeping a streaming pipeline alive at all times and engineering the training loop to be robust to bursts of bad data arriving in real time.
The main weakness is staleness. A model trained offline only knows about the data that existed when the snapshot was taken. If user behavior, product mix, or the underlying distribution changes after that, the model has no way to notice. Google's documentation illustrates the problem with a flower purchase model trained only on July and August data: it performs well in late summer but predicts badly around Valentine's Day, when buying patterns change sharply. This kind of failure is what concept drift and data drift usually refer to in production ML.
The usual mitigation is to retrain on a schedule and to monitor the input distribution at inference time so that you notice drift before it shows up as user complaints. Static training does not mean training once forever; it just means you batch retraining into discrete jobs rather than letting the model learn continuously.
Offline inference, also called static inference or batch inference, is the prediction-time counterpart to offline training. Instead of computing a prediction at the moment a user asks for it, the system precomputes predictions for many inputs in advance, stores them in a database or cache, and serves them by lookup.
A batch inference job typically runs on a schedule, for example every night. It pulls a list of items or users to score, runs the model over all of them, and writes the results into a key value store, a feature store, or a database table. The serving application then retrieves the cached prediction when it needs one. Because the lookup is a simple database read, response times track the latency of the storage layer, not the model.
Netflix-style nightly recommendation refreshes are a familiar example. A large model can spend hours generating a fresh recommendation list for every active user, and the front end serves any one of those lists in milliseconds. Lead scoring, churn prediction, and credit risk scoring for known customers follow the same pattern.
The Google ML crash course summarizes the upside as low inference cost, the ability to verify predictions before serving them, and near instant response time to end users. Because the predictions are sitting in storage, you can audit them, sanity check distributions, run business rules over them, and even discard or override a batch before it goes live. You can also use a much heavier model than would be feasible for real time serving, since latency at request time depends only on the cache, not the model.
The tradeoff is freshness and coverage. A batch system can only serve predictions for inputs it has already seen and scored. If a brand new user arrives, the cache has no prediction for them. If a known user's situation changes between batch runs, the cached prediction is stale. Update latency is usually measured in hours or days, which is fine for nightly product recommendations but unacceptable for fraud detection or pricing decisions that have to react within a transaction.
This is why batch inference is often paired with a simpler fallback model that runs online for unknown inputs, or with a hybrid system where the heavy model runs offline and a lightweight model handles the long tail in real time.
The choice between offline and online operation is usually decided by three questions: how quickly does the input distribution change, how fresh do predictions need to be, and how much engineering complexity can the team afford.
| Dimension | Offline (static) | Online (dynamic) |
|---|---|---|
| Training data | Fixed snapshot | Streamed or frequently refreshed |
| Update cadence | Hours to weeks | Continuous |
| Inference latency | Cache read time | Model execution time |
| Engineering effort | Lower | Higher |
| Handling of new inputs | Limited to scored items | Can score on demand |
| Sensitivity to drift | Higher | Lower if retraining is frequent |
| Cost profile | Periodic batch compute | Steady streaming compute |
In practice many production systems are hybrids. A recommender might train offline weekly, precompute candidate lists nightly, and run a small online model to rerank a handful of candidates per request. A search engine might cache offline embeddings for documents and compute query embeddings online. The real question is rarely offline or online; it is which parts of the pipeline live on which side of the line.
A related use of the word is offline evaluation, evaluating a model on a held out dataset before deployment, as opposed to online evaluation through A/B tests on live traffic. Offline evaluation is cheap and fast and is how most candidate models are filtered, but it is not always predictive of live performance, because the evaluation set is itself a snapshot. Serious production teams run both: offline metrics to decide what to ship, online metrics to decide whether shipping it actually helped.
Offline reinforcement learning is a related but distinct subfield, and it deserves its own treatment because the meaning of "offline" there is technically narrower. In standard reinforcement learning, an agent learns a policy by interacting with an environment, taking actions, and observing rewards. Offline RL, also called batch RL, removes that interaction. The agent learns a policy purely from a fixed dataset of previously logged transitions, with no ability to try new actions in the environment during training.
Sergey Levine and collaborators, in their 2020 tutorial and review on offline reinforcement learning, define it as reinforcement learning algorithms that use previously collected data without additional online data collection. The Berkeley AI Research blog uses similar language: learning skills solely from previously collected datasets, without any active environment interaction.
The motivation is that environment interaction is expensive or dangerous in many real domains. You cannot let a learning agent freely experiment with treatment plans for hospital patients, driving policies for self driving cars, or trading strategies in live markets, but you may have years of logged decisions and outcomes from those domains. Offline RL aims to convert that logged data into a useful policy without ever rolling out an unsafe one.
Offline RL has a specific failure mode known as distributional shift. A standard RL algorithm can in principle assign a high value to actions that were never tried in the dataset, because nothing in the data contradicts that estimate. When the learned policy then acts on those overestimated actions, the results can be much worse than the data would suggest. Levine's tutorial frames this as the central challenge of the subfield, and most modern offline RL methods, including Conservative Q Learning and behavior regularized approaches, are designed specifically to keep the learned policy from drifting too far from the logged behavior.
Offline supervised learning predicts labels from features. Offline RL trains a policy to take actions that maximize cumulative reward, using a sequence of state, action, reward, next state tuples generated by some earlier behavior policy. The goal is to produce a new policy that is better than the one that generated the data. That counterfactual element, asking what would have happened if the agent had acted differently, is what makes offline RL harder than ordinary batch supervised learning.
A few patterns come up repeatedly in offline ML systems and are worth flagging.
Online and offline skew. If features are computed differently at training time and at serving time, the model can degrade silently even when the data distribution has not changed. Production teams often invest in a feature store to compute features the same way in both paths.
Stale labels. Offline training assumes the snapshot labels reflect the world the model will encounter at serving time. When ground truth arrives with a long delay, for example a churn label confirmed 90 days after the prediction, the training set can lag the production distribution by months.
Cache invalidation. Batch inference works only as long as someone refreshes the cache. A common incident is a forgotten batch job that has been failing for weeks, with the application happily serving increasingly stale predictions.
Confusing offline RL with offline supervised learning. Despite the shared word, the two settings have different theory and different failure modes. Treating an offline RL problem as a regression task on logged actions usually leaves a lot of performance on the table.
Imagine you are baking cookies for a school bake sale. You can bake them all the night before and put them in a box, ready to hand out. That is offline: do the work once, store the results, and serve them later. Or you can wait at the table and bake each cookie fresh whenever someone walks up. That is online: do the work on demand, every time. The first way is cheaper and faster to serve, but you cannot make a flavor for a kid you did not expect. The second way handles surprises but is much more work to keep going. Machine learning systems make the same choice when they decide whether to train and predict ahead of time or on the fly.