Static

Static in machine learning

In machine learning, the word static almost always means offline, or batch. Google's Machine Learning Glossary uses it as a synonym for offline in three closely related places: a static model is one that is trained offline and does not change; static training is the act of training that model once on a fixed dataset; and static inference (also called offline inference or batch inference) is the practice of generating predictions ahead of time and caching them for later lookup. The opposite term in every case is dynamic, which covers continuously retrained models, online training, and on demand inference.

The label is useful because production ML systems must answer two separate questions about the time dimension: when does the model learn, and when does it produce predictions. Static answers "in advance" for both. Dynamic answers "as data arrives" or "as requests arrive." Most real systems mix the two, but the static side of the line gets a lot of work done at lower operational cost, which is why it remains the default starting point for many teams.

Static training (offline training)

Static training, also called offline training, means the model is fit once on a fixed snapshot of data and then deployed. There is no online update loop. If the team wants the model to reflect newer data, someone schedules a fresh training run, often weekly or monthly, and replaces the deployed artifact. Between those runs the parameters do not move.

Google's own framing of this is blunt: "you train a model only once. You then serve that same trained model for a while."

The advantages of static training are mostly practical:

The pipeline only has to work once per training cycle, so you can verify the model carefully before it goes into production. That includes offline evaluation on a held out set, fairness checks, calibration plots, and red team probes that would be slow or risky to run on a moving target.
Operational overhead is lower. There is no streaming training infrastructure to babysit, no checkpoint rollbacks to design, and no continuous monitoring of the training job itself.
It is cheaper. Compute is paid for in scheduled bursts rather than as a constant background expense.
Reproducibility is easier. A static model is just a file, plus the data and code that produced it. You can rebuild it, diff it against an earlier version, and audit it.

The trap is staleness. The world the model was trained on stops matching the world the model has to predict. Google's textbook example is a model that estimates the probability a user will buy flowers, trained only on data from July and August. The model behaves reasonably for several months and then makes terrible predictions in February, because Valentine's Day flips user behavior in a way the training data never saw. The same pattern shows up with snow shovel sales, holiday travel, fashion trends, and almost anything where seasonality, news events, or new product launches matter.

For that reason, static training is a fit when:

The distribution of inputs is genuinely stable, or at least stable over the horizon between retrains.
The cost of being slightly out of date is small.
Strict pre release verification matters more than freshness, for example in regulated domains or safety critical systems.

It is a bad fit when behavior is seasonal, when an adversary is actively trying to game the model (spam, fraud), or when the catalog of items the model has to score changes daily. In those cases a dynamic model with online or near online training tends to win, even though it costs more to run.

Even with static training, you still need to monitor the inputs at serving time. The model is frozen, but the data is not, and the cheapest sign that a static model has gone stale is a drift in the distribution of features it sees in production.

Static inference (offline inference, batch inference)

Static inference is the prediction side of the same idea. Instead of running the model in response to a user request, you run it ahead of time on a known set of inputs, write the results to a key value store or a database, and then have the application look up the answer when it needs one. Google's glossary calls this "the process of a model generating a batch of predictions and then caching (saving) those predictions."

The canonical example is a recommendation system. A streaming service can compute a top 100 list of likely titles for every active user every night, write those 100 IDs per user into a fast key value store, and serve them from cache when the user opens the app. The model itself never runs at request time. The user sees a sub 100 millisecond response because the answer was already there.

Netflix has described a hybrid version of this in detail. Candidate items are generated in batch, embeddings are refreshed by nightly batch inference jobs, and then a smaller online model re ranks the precomputed candidates using the user's current session context. The heavy work is static; only the final personalization step is dynamic. This pattern is common across recommender systems, lead scoring pipelines, churn prediction for marketing, and any workload where the population of users or items to score is mostly known in advance.

Advantages of static inference

Latency at serving time is essentially the latency of a cache lookup. The expensive model never runs on the request path.
The cost of inference becomes a scheduling problem rather than a scaling problem. You can run the batch on cheap, off peak compute and size for throughput instead of for tail latency.
You can verify the predictions before exposing them. That includes spot checking, fairness audits, content policy filters, and removing obvious mistakes. With dynamic inference there is no equivalent moment to inspect the output before users see it.
Failure modes are easier to reason about. If the model misbehaves, the cache holds yesterday's predictions and the application keeps working; you can roll the cache back without redeploying a service.

Limitations of static inference

You can only serve predictions you precomputed. If the input is unusual, for example a brand new user, a brand new product, or a query no one has run before, the cache has nothing for it. This is the long tail problem, and it is the reason pure static inference does not work for systems that have to handle arbitrary queries.
Freshness is bounded by the batch cadence. If the batch runs nightly, the predictions can be up to 24 hours old. For many systems that is fine. For ad ranking, fraud detection, or any system reacting to a news cycle, it is not.
The space of inputs you have to enumerate can be huge. Computing predictions for 200 million users times 100 candidate items per user is 20 billion predictions per run, and that storage and compute budget has to be planned for.

Many teams fall back to a hybrid: use static inference for the common cases, and call the model dynamically for the long tail. That gives you the cost profile of batch with a safety net for inputs the batch did not anticipate.

Static features

A related but narrower use of the word: a static feature is an input attribute that does not change, or changes only rarely, over the life of an example. Birth year, country of registration, manufacturer, and product category are static in a way that current location, recent click history, or live price are not. Static features are usually cheaper to store and less prone to leakage in training, because their values at inference time match the values that were available at training time. Many feature stores explicitly separate static (or slowly changing) features from streaming features for this reason.

Static versus dynamic at a glance

Dimension	Static (offline, batch)	Dynamic (online, on demand)
Training	One fit on a fixed dataset, repeated on a schedule	Continuous or frequent updates as new data arrives
Inference	Precomputed predictions, served from cache	Model runs at request time
Operational cost	Lower; scheduled compute and simpler pipelines	Higher; always on training and serving infrastructure
Latency at serving	Cache lookup latency	Bounded by model forward pass and request handling
Freshness	Stale between batches	Reflects very recent data
Verification before deploy	Easy; the artifact is fixed	Harder; the model and the data are both moving
Long tail inputs	Poor; only cached cases are covered	Good; the model can score anything
Typical risks	Concept drift, seasonality, novelty	Training instability, feedback loops, drift in the loss

Most systems do not sit cleanly on one side. The point is to make the trade off explicit: static buys you stability and lower running costs, at the price of freshness and coverage. Dynamic gives you freshness and coverage, at the price of complexity.

Staleness, drift, and when to retrain

The central failure mode of static training is concept drift: the statistical relationship between inputs and the label changes over time, so a model that was accurate last quarter starts making worse predictions now. Drift can be gradual (slow shifts in user behavior), seasonal (the flower example), or sudden (a new competitor product, a policy change, a viral news story).

The usual operational response to drift in a static setup is some combination of:

Monitoring input distributions and prediction distributions in production, and alerting when they diverge from the training data.
Tracking a holdout of labeled production data so the team can compute live accuracy or AUC and watch it trend.
Scheduling regular retrains, weekly or monthly being common defaults, and treating each retrain as a small release with its own evaluation and rollback plan.
Adding features that explicitly encode the time of year, the day of week, or other cyclic signals, so the static model does not have to re learn seasonality every cycle.

This is where most production ML lives. The model is technically static between releases, but the system around it is doing a lot of dynamic work to keep it honest. When the cost of that surveillance rivals the cost of just retraining online, teams usually flip to a dynamic model.

When to choose static

Google's crash course gives a clean default: if your dataset truly is not changing over time, choose static training because it is cheaper to create and maintain than dynamic training. The same logic applies to inference. If the set of inputs is small, stable, and known in advance, precompute the answers and serve from cache.

Good candidates for a fully static stack:

Demographic or actuarial models where behavior changes slowly and audits are common.
Lead scoring batches that only need to run overnight to feed a sales team the next morning.
Recommendation backbones where the catalog turns over slowly and a thin online layer handles personalization.
Content moderation classifiers retrained quarterly as new policy categories appear.

Good candidates for going dynamic instead:

Adversarial inputs (fraud, spam, abuse), where the distribution shifts because the other side is reacting to your model.
Real time bidding, ranking, and search, where freshness is the product.
News feeds, where the catalog of items barely existed an hour ago.
Systems where novel inputs are common and a cache miss is unacceptable.

Most production systems are a mix. Static where you can, dynamic where you must.

Explain like I'm 5 (ELI5)

Imagine the cafeteria is going to serve lunch to the whole school. There are two ways to do it. One way is to figure out yesterday what every kid likes and put their tray together in advance, so when they walk up the food is already there. That is static. It is fast at lunch time, but if a new kid shows up, or someone changes their mind, the tray is wrong. The other way is to wait until each kid is at the counter and make their plate then. That is dynamic. It handles surprises, but the line moves slower and the cafeteria has to be busy the whole time. Static machine learning does the cafeteria's prep work the night before: train the model once, or precompute the answers once, and serve them out of a cache the next day.

References

Google for Developers, Machine Learning Glossary. https://developers.google.com/machine-learning/glossary
Google for Developers, Machine Learning Crash Course, Production ML systems: Static versus dynamic training. https://developers.google.com/machine-learning/crash-course/production-ml-systems/static-vs-dynamic-training
Google for Developers, Machine Learning Crash Course, Production ML systems: Static versus dynamic inference. https://developers.google.com/machine-learning/crash-course/production-ml-systems/static-vs-dynamic-inference
Google for Developers, Machine Learning Glossary: ML Fundamentals. https://developers.google.com/machine-learning/glossary/fundamentals
Wikipedia, Concept drift. https://en.wikipedia.org/wiki/Concept_drift
IBM, What is model drift? https://www.ibm.com/think/topics/model-drift
Google Cloud, What is batch inference? https://cloud.google.com/discover/what-is-batch-inference
Netflix Technology Blog, Integrating Netflix's Foundation Model into Personalization applications. https://netflixtechblog.medium.com/integrating-netflixs-foundation-model-into-personalization-applications-cf176b5860eb

Static in machine learning

Static training (offline training)

Static inference (offline inference, batch inference)

Advantages of static inference

Limitations of static inference

Static features

Static versus dynamic at a glance

Staleness, drift, and when to retrain

When to choose static

Explain like I'm 5 (ELI5)

References

Improve this article

Related Articles

Machine learning terms/Natural Language Processing

Machine learning terms/Computer Vision

Machine learning terms/Sequence Models

Split

Agglomerative clustering

Area under the PR curve

Static in machine learning

Static training (offline training)

Static inference (offline inference, batch inference)

Advantages of static inference

Limitations of static inference

Static features

Static versus dynamic at a glance

Staleness, drift, and when to retrain

When to choose static

Explain like I'm 5 (ELI5)

References

Related Articles

Machine learning terms/Natural Language Processing

Machine learning terms/Computer Vision

Machine learning terms/Sequence Models

Split

Agglomerative clustering

Area under the PR curve