Dynamic model

See also: Machine learning terms

A dynamic model in machine learning is a model that is retrained frequently or continuously as new data arrives, so that its parameters track changes in the underlying data distribution over time. The Google Machine Learning Glossary defines a dynamic model as one that is "frequently (maybe even continuously) retrained," describes it as a "lifelong learner that constantly adapts to evolving data," and notes that the term is synonymous with online model ^[1]. The opposite is a static model, which is trained once on a snapshot of historical data and then served unchanged for some period.

Dynamic models are the natural choice for production environments where the relationship between features and labels changes over hours, minutes, or seconds. Common examples include recommender systems, online ad ranking, fraud detection, dynamic pricing, news ranking, and short-video feed personalization. The general term for the process of routinely refitting a model on fresh data is continuous training (CT), which sits alongside continuous integration and continuous delivery (CI/CD) in the MLOps pipeline ^[2].

Definition

A static model fixes its parameters at training time and treats inference as a separate stage that may run for days, months, or years on the same weights. A dynamic model collapses that boundary. New observations stream into a training process that is either always running or kicked off on a short cadence (every few minutes, every hour, or every day), and the resulting weights replace or augment the model in production.

The Google Machine Learning Crash Course frames the choice as static training versus dynamic training. In static training, a model is trained once on a fixed dataset and then served for a while; in dynamic training, the model is trained continuously or at least frequently, and the most recently trained version is the one that gets served ^[3]. The crash course also distinguishes static inference (predictions are computed offline and cached) from dynamic inference (predictions are computed on demand at request time). A system can be dynamic in training, dynamic in inference, or both. A common production pattern is dynamic training with dynamic inference, since the value of fresh weights would otherwise be lost in a stale prediction cache.

In the online machine learning literature, the same idea is described as processing data "in a sequential order" and updating the predictor at each step, in contrast with batch learning that fits a model in one pass over the entire training set ^[4]. From this point of view, a dynamic model is the operational embodiment of an online learning algorithm running indefinitely against a live data stream.

Comparison to static models

Static and dynamic models differ along several axes. The differences matter because they determine which engineering investments a team has to make and what failure modes they have to monitor.

Property	Static (offline) model	Dynamic (online) model
Training cadence	Once, or every few weeks	Continuous, hourly, or per-minute
Data assumption	Stationary distribution	Distribution can shift
Memory of past data	Full historical pass each time	Single pass or small replay buffer
Hardware footprint	Periodic large training jobs	Always-on training pipeline
Time to incorporate new label	Hours to weeks	Seconds to minutes
Failure modes	Drift, staleness	Drift detection bugs, runaway feedback, label leakage
Validation strategy	Train, validate, holdout, ship	Shadow deployment, A/B testing, progressive validation
Rollback model	Redeploy previous artifact	Snapshot weights and revert
Monitoring requirement	Input distribution and label shift	Input distribution, label shift, training health, model freshness
Typical use case	Image classification, demand forecasting at weekly cadence	Ad CTR prediction, fraud scoring, short-form video feed

Google's documentation makes the trade-off explicit. Static training is "simpler to build and test," but "if you train offline, then the model has no way to incorporate new data as it arrives," which leads to staleness when the distribution shifts. Dynamic training keeps the model fresh but "requires continuous building, testing, and releasing cycles" and a heavier monitoring stack ^[3]. Even teams that pick static training are advised to monitor input distributions in production, because data drift can degrade a frozen model just as easily as a live one.

Chip Huyen's Designing Machine Learning Systems (O'Reilly, 2022) makes a related distinction between stateless retraining and stateful training. In stateless retraining, each training run starts from scratch on a fresh window of data; in stateful training, each run continues from the previous model's weights, only doing a partial update on new examples. Both fall under the dynamic model umbrella, but stateful training is the lower-cost option once a baseline exists ^[5].

Online learning algorithms

Dynamic models depend on algorithms that can update parameters incrementally without revisiting the entire training set. The online learning literature provides several families of such algorithms, all sharing the property that they consume one example (or a small minibatch) at a time and produce a new parameter vector after each update.

Stochastic gradient descent

Stochastic gradient descent (SGD) is the workhorse of online learning. After each example, SGD computes a gradient of the loss with respect to the parameters and takes a step in the negative gradient direction. The pure online form of SGD has constant memory cost O(d) for d parameters and constant time per example, which makes it scale to data streams of arbitrary length ^[4]. Most production dynamic models, including ad CTR predictors and embedding-based recommenders, are SGD-based variants.

Perceptron

The perceptron, originally proposed by Frank Rosenblatt in 1958, is one of the earliest online learning algorithms. It updates a linear classifier only when it makes a mistake on the current example. Novikoff's 1962 mistake-bound theorem shows that on a linearly separable dataset with margin gamma and example radius R, the perceptron makes at most (R/gamma)^2 mistakes regardless of how many examples it sees ^[6]. The mistake bound framework introduced by the perceptron analysis still anchors the way online algorithms are evaluated theoretically.

Online passive-aggressive algorithms

Online passive-aggressive (PA) algorithms, introduced by Crammer, Dekel, Keshet, Shalev-Shwartz, and Singer in 2006, take a more aggressive update than the perceptron. After each example, PA chooses the smallest weight change that satisfies a margin constraint on the current example. This produces a closed-form analytical update and tight regret bounds for binary classification, regression, multi-class classification, uniclass prediction, and sequence labelling ^[7]. The PA family is a common choice when the training stream contains both correctly and incorrectly classified examples and the model needs to react sharply to surprises.

Online gradient descent and online convex optimization

Martin Zinkevich's 2003 paper "Online Convex Programming and Generalized Infinitesimal Gradient Ascent" generalized SGD to the online convex optimization (OCO) framework. In OCO, an adversary picks a sequence of convex losses, the learner picks parameters before seeing each loss, and the goal is to minimize regret against the best fixed comparator in hindsight. Zinkevich proved that simple projected gradient descent with stepsize eta_t = 1/sqrt(t) achieves O(sqrt(T)) regret on Lipschitz convex losses, which is the standard reference rate for online learning ^[8].

Follow the regularized leader

Follow the Regularized Leader (FTRL) is the dual of online gradient descent. At each step, FTRL solves an optimization that minimizes the cumulative loss seen so far plus a regularization term. The FTRL-Proximal variant, introduced by H. Brendan McMahan and colleagues at Google, was designed for click-through rate (CTR) prediction at scale and combines L1 regularization (for sparsity) with per-coordinate adaptive learning rates similar to AdaGrad. The KDD 2013 paper "Ad Click Prediction: a View from the Trenches" is the canonical reference for production-grade FTRL-Proximal on a live ad serving system ^[9].

Recursive least squares

Recursive least squares (RLS) is the online analog of ordinary least squares for linear regression. It maintains an inverse covariance estimate that is updated by the Sherman-Morrison formula after each example, achieving O(d^2) per-step time and O(d^2) memory. RLS is a natural choice when an online linear model needs second-order information without the full cost of refitting an offline regression ^[4].

Hoeffding trees

Decision trees pose a special challenge for online learning because each split decision in principle depends on the entire dataset. Domingos and Hulten's 2000 paper "Mining High-Speed Data Streams" introduced the Very Fast Decision Tree (VFDT), also called the Hoeffding tree, which uses the Hoeffding bound to prove that a small sample is sufficient to choose a split with high probability. The result is a tree that can ingest tens of thousands of examples per second on commodity hardware while approximating the tree that batch training on the same data would have produced ^[10]. Hoeffding trees and their adaptive successors (HAT, EFDT) remain the most common online tree learners in streaming pipelines.

Online passive learners and budget perceptrons

For settings where memory is bounded, several variants of online learners maintain a fixed-size pool of support vectors or prototypes. Examples include the Forgetron, Randomized Budget Perceptron, and Online Passive-Aggressive on a Budget. These are useful in resource-constrained dynamic models where the support set cannot be allowed to grow without bound.

Concept drift and detection methods

Dynamic models are valuable precisely because real-world data shifts. The literature distinguishes several kinds of shift, and the choice of detection algorithm depends on which kind is dominant in the application.

Types of distribution shift

Type	What changes	Also called	Typical example
Covariate shift / data drift	P(X)	Data drift, feature drift	Customer demographic mix changes after product launch
Label shift	P(Y)	Prior probability shift	Fraud rate rises during a holiday weekend
Concept drift	P(Y\|X)	Real concept drift	Definition of spam evolves as spammers change tactics
Posterior shift	P(X\|Y)	Conditional shift	New imaging device produces brighter X-rays

The distinction between data drift and concept drift is the most cited in production ML literature. Data drift occurs when the input distribution changes but the input-output relationship stays the same; concept drift occurs when the input-output relationship itself changes ^[11]. Both can degrade a static model, but only retraining on fresh labels (a dynamic model) can fix concept drift.

Concept drift can be abrupt (an overnight regime change), gradual (an old concept fades while a new one rises), incremental (a continuous shift in the boundary), or recurring (seasonal patterns that re-emerge). Different detectors react well to different patterns.

Drift detection algorithms

Detector	Year	Authors	Idea	Best for
DDM (Drift Detection Method)	2004	Gama, Medas, Castillo, Rodrigues	Monitor the binomial error rate; alarm when it exceeds a threshold based on its own minimum	Abrupt drift on classifiers with declining error
EDDM (Early Drift Detection Method)	2006	Baena-Garcia, del Campo-Avila, Fidalgo, Bifet, Gavalda, Morales-Bueno	Monitor the distance between consecutive errors instead of the raw error rate	Gradual drift
Page-Hinkley test	1954 / streaming use 2000s	E. S. Page (original); revived by Gama et al.	Cumulative sum test on the difference between current accuracy and a moving average	Smooth drift in numerical signals
ADWIN (Adaptive Windowing)	2007	Bifet, Gavalda	Maintain a variable-length sliding window; cut the window when statistics on its two halves differ	Any drift, with rigorous false-positive bounds
HDDM	2014	Frias-Blanco, del Campo-Avila, Ramos-Jimenez, Morales-Bueno, Ortiz-Diaz, Caballero-Mota	Hoeffding-bound-based test on weighted moving averages	Streams with non-stationary noise
KSWIN	2020	Raab, Heusinger, Schleif	Kolmogorov-Smirnov test on a sliding window	Distribution-free drift on numeric features

ADWIN is a particularly common choice in production streaming systems because Bifet and Gavalda's 2007 SDM paper provided rigorous bounds on both false-positive and false-negative rates, and because ADWIN can be plugged in as a black-box monitor for either model error or any individual feature ^[12]. The drift_detection module of the river library implements ADWIN, DDM, EDDM, HDDM, KSWIN, and Page-Hinkley with a uniform interface.

What detection triggers

When a detector raises an alarm, the dynamic model has several options. It can reset its weights and start retraining from scratch, increase the learning rate to react more quickly, switch to a buffered alternative model that has been training in parallel, or simply log the event for a human operator. Production systems usually combine these: minor drift triggers an automatic catch-up update, major drift pages a human.

Streaming machine learning frameworks

A production dynamic model needs more than an online algorithm; it needs an entire pipeline that can ingest streams, update parameters, evaluate on the fly, and serve predictions. Several open-source frameworks specialize in this layer.

Framework	Language	First released	Maintainer	Specialty
river	Python	2020 (merger of creme + scikit-multiflow)	online-ml community	General-purpose online ML in Python with progressive validation
Vowpal Wabbit	C++	2007	Microsoft Research (originally Yahoo Research)	Massive-scale online learning, contextual bandits, hashing trick
MOA (Massive Online Analysis)	Java	2010	University of Waikato	Stream classification, clustering, drift detection, evaluation tools
Apache Flink ML	Java / Scala	2015	Apache Software Foundation	Online learning on top of Flink streaming runtime
Spark Streaming MLlib	Scala / Python	2014	Apache Software Foundation	Streaming linear regression, k-means, batch-style streaming
scikit-learn (partial_fit)	Python	2010 (incremental support)	scikit-learn community	SGDClassifier, SGDRegressor, Naive Bayes, MiniBatchKMeans
TensorFlow Extended (TFX)	Python	2017	Google	End-to-end ML pipelines with continuous training support
ByteDance Monolith	Python / C++	Open-sourced 2022	ByteDance	Real-time recommendation training with collisionless embeddings

River

River is the result of a 2020 merger between two earlier projects: creme (started in 2018 at Telecom ParisTech) and scikit-multiflow (started by Bifet's group at the University of Waikato). The library provides online versions of linear models, decision trees and random forests, k-nearest neighbors, anomaly detectors, drift detectors, recommender systems, time-series models, factorization machines, and bandits. The JMLR 2021 paper "River: machine learning for streaming data in Python" by Montiel, Halford, Mastelini, and others is the canonical reference ^[13]. Version 0.24.2 was released on April 15, 2026.

Vowpal Wabbit

Vowpal Wabbit (VW), started by John Langford at Yahoo Research and now maintained at Microsoft Research, is the heavyweight open-source online learner. VW uses SGD-based online learning combined with the hashing trick (32-bit MurmurHash3 of feature names into a fixed-size weight vector) to scale to billions of features and billions of examples ^[14]. It supports binary and multiclass classification, regression, contextual bandits, active learning, and reductions for structured prediction. VW has been used to learn a tera-feature (10^12) dataset on 1000 nodes in one hour, which remains a reference point for raw online-learning throughput.

MOA (Massive Online Analysis)

MOA is the Java equivalent of WEKA for streaming data, created in 2010 by Bifet, Holmes, Kirkby, and Pfahringer at the University of Waikato. The framework provides Hoeffding trees, ADWIN, drift detectors, online ensembles (online bagging, leveraging bagging, ARF), clusterers, and evaluation tools for prequential and holdout-on-stream evaluation ^[15]. MOA is the most cited reference for academic streaming ML benchmarks.

Apache Flink ML

Flink ML is the machine learning library built on top of Apache Flink's streaming runtime. It supports online versions of common preprocessors (OnlineStandardScaler, OnlineKMeans), agglomerative clustering, and online linear models. The library reached 2.2.0 in April 2023 and is in production use at Alibaba for real-time clustering and feature engineering on log data ^[16]. Its main appeal is the unified pipeline API that lets the same algorithm run on bounded (offline) and unbounded (online) data streams.

Spark MLlib (streaming)

Apache Spark MLlib supports streaming linear regression and streaming k-means via the StreamingLinearRegressionWithSGD and StreamingKMeans classes. The implementation runs SGD on each Spark Streaming batch, so it is closer to mini-batch online learning than to true per-example online learning ^[17]. For teams already on Spark, this is the smallest-effort path to a dynamic model.

scikit-learn partial_fit

scikit-learn does not advertise itself as a streaming library, but a number of its estimators expose a partial_fit method that supports incremental training: SGDClassifier, SGDRegressor, PassiveAggressiveClassifier, PassiveAggressiveRegressor, Perceptron, MultinomialNB, BernoulliNB, MiniBatchKMeans, and others. Combined with dask-ml's Incremental wrapper, partial_fit is the path of least resistance for adding online learning to an existing scikit-learn pipeline.

Production frameworks at hyperscalers

Beyond open-source libraries, every large platform that depends on dynamic models maintains its own internal training infrastructure. ByteDance's Monolith (open-sourced 2022) is built on TensorFlow with a Worker / Parameter-Server architecture and uses Cuckoo hashmaps for collisionless embedding tables, allowing TikTok to update its recommendation model on a minute scale ^[18]. Google uses TensorFlow Extended (TFX) and an internal continuous training system that integrates FTRL-Proximal for ad ranking and other linear models. Meta uses PyTorch with FBLearner Flow for batch training and a separate online training stack for ranking. Netflix uses a hybrid of offline training plus online fine-tuning for portions of its recommendation stack.

Use cases

Dynamic models are not appropriate for every problem; they pay back the engineering cost only when the data distribution changes faster than the static-retrain cadence can keep up with. The most common production use cases share that property.

Recommender systems

Large-scale recommender systems are the canonical home for dynamic models. User preferences shift in real time, new items appear hourly, and the value of a recommendation depends on freshness. Most production recommenders combine offline-trained candidate generation models (refreshed daily or weekly) with online-trained ranking models (refreshed on a minute scale).

Online advertising and CTR prediction

Click-through rate prediction is the second canonical home. Ads, queries, and creatives turn over at hour-by-hour rates, and a 1% lift in CTR translates into eight or nine figures of revenue at scale. Google's KDD 2013 paper documents an FTRL-Proximal-based dynamic model serving ad CTR predictions in production ^[9]. Comparable systems are described in publications from Yahoo, Microsoft, Meta, and ByteDance.

Fraud detection

Fraud detection is a textbook concept-drift problem: adversaries change tactics in response to detection. Static models become obsolete as soon as they ship. Production fraud detection systems combine an online learning model trained on labeled fraud reports with a separate anomaly detector tuned to flag previously unseen patterns.

Algorithmic trading and dynamic pricing

Financial markets and dynamic pricing systems both deal with non-stationary data where the cost of staleness is measured in basis points or in lost revenue per minute. Most production trading systems combine slow offline-trained risk models with fast online-trained execution models.

News and short-video ranking

News ranking, short-form video ranking (TikTok, Reels, Shorts), and feed personalization all share the property that the inventory turns over within hours. TikTok's Monolith paper documents a system that incorporates user interactions into the model within minute-scale latency, which the authors directly attribute to its real-time online training pipeline ^[18]. Netflix has reported moving portions of its recommendation stack from batch to online training because batch-trained models created "regret as many members over a long period did not benefit from the better experience."

Spam filtering

Spam filtering is one of the oldest production applications of online learning, dating back to early Bayesian filters. Modern email and chat platforms run a continuous training loop that incorporates user-flagged spam labels into a refreshed model every few hours.

Industrial monitoring and predictive maintenance

Industrial sensor streams (vibration, temperature, current draw) are non-stationary because equipment ages, environmental conditions vary, and operating modes change. Online learning algorithms with drift detectors are well-matched to this regime. Apache Flink ML is used at Alibaba for online clustering of log data, and a similar pattern shows up in factory sensor monitoring with MOA, river, or custom Flink jobs.

Production examples

Platform	Application	Approach	Reference
Google Ads	Sponsored search CTR prediction	FTRL-Proximal with per-coordinate learning rates	McMahan et al., KDD 2013 ^[9]
Google Play	App recommendation	Wide & Deep with online fine-tuning	Cheng et al., DLRS 2016
TikTok	For You feed ranking	Monolith real-time training, minute-scale updates	Liu et al., arXiv 2022 ^[18]
Netflix	Homepage ranking	Hybrid offline plus online fine-tuning	Netflix Tech Blog
YouTube	Video ranking	Two-stage candidate generation plus online ranker	Covington et al., RecSys 2016
Spotify	Daily Mix and Discover Weekly	Weekly batch retrains plus online bandit layer	Spotify Engineering
LinkedIn	Feed ranking	Offline GLMix plus online ranker fine-tuning	LinkedIn Engineering
Alibaba	Real-time clustering on log streams	Flink ML OnlineKMeans	Apache Flink blog ^[16]
Meta (Facebook)	News Feed and Ads	Continuous training on PyTorch with FBLearner Flow	Meta Engineering

These systems vary in how aggressively they update their models. TikTok pushes new weights on roughly a one-minute cadence. Google Ads CTR predictors update on the order of a few minutes. Netflix's recommendation models historically updated every few hours but have moved closer to minute-scale for some surfaces. Spotify's Discover Weekly remains a weekly batch update because the user expectation is a weekly playlist drop, not a continuously shifting one.

Trade-offs

Dynamic models cost more to operate than static ones, and they introduce failure modes that static models do not have. The decision to go dynamic should be driven by a measured cost of staleness, not by aesthetics.

Operational cost

A dynamic model requires an always-on training pipeline. That means continuous data ingestion, feature computation, label joining (often the hardest part), gradient computation, parameter updates, and rollouts to inference servers. Each of these layers needs to be scaled, monitored, and on-call rotated. A typical production dynamic model is two to five engineers' worth of operational ownership beyond the model itself.

Label latency

Online learning depends on quickly observed labels. CTR prediction has labels available within seconds (a click happens or it doesn't). Fraud detection has labels available after minutes to days, since chargebacks take time. Long-horizon prediction problems (lifetime value, churn) have labels that arrive too late for online learning to be useful, and for those problems static or hybrid models are usually the right choice.

Feedback loops

A dynamic model that ranks the items it sees can amplify its own biases. If a recommender is trained on the clicks generated by its own previous predictions, its training distribution becomes self-conditioned. The standard mitigation is to mix in exploration via contextual bandits or randomized impressions, and to log propensity-corrected rewards for off-policy evaluation.

Validation difficulty

Classic train / validation / test splits do not apply directly to streaming models. The standard alternatives are prequential evaluation (test each example before training on it, then update), interleaved test-then-train, and holdout-on-stream (set aside a small fraction of the stream for evaluation only). Production systems also rely on shadow deployments, A/B tests, canary releases, and interleaving experiments, all of which are described in detail in Huyen's Designing Machine Learning Systems ^[5].

Catastrophic forgetting

Neural networks trained online can forget old patterns quickly when the input distribution shifts, a phenomenon known as catastrophic forgetting or catastrophic interference. Mitigations include replay buffers (keeping a sample of older data and mixing it into updates), elastic weight consolidation, and architectural choices that protect specific subspaces of weights from rapid updates. The continual learning literature exists largely to address this issue.

Monitoring complexity

A static model can be monitored by checking its prediction distribution and a few business metrics. A dynamic model needs all of that plus monitoring for training health (loss curves, gradient norms), data freshness (how stale is the most recent example), label freshness (how stale is the most recent label), drift detection signals, and rollback readiness (can the system swap to a previous snapshot if today's training run goes off the rails). Drift-aware monitoring tools, often built around ADWIN or KSWIN under the hood, are part of every well-run dynamic model.

Reproducibility

A static model trained from a fixed dataset and a fixed seed is reproducible to the bit. A dynamic model that has been training continuously for six months has consumed billions of examples in a specific order from a specific stream, and reproducing it exactly is usually impossible. The replacement is not bit-level reproducibility but process-level reproducibility: the same training pipeline applied to the same window of data should produce a model with the same statistical behavior.

Choosing between static and dynamic

A short decision guide for picking between the two regimes:

If labels arrive within seconds and the data distribution shifts within hours, build a dynamic model.
If the data is stationary and the cost of staleness is low, ship a static model and revisit on a quarterly cadence.
If labels are slow but the input distribution shifts, consider a hybrid: a static label-prediction model with online drift detection that triggers retrains.
If a single team has to own everything end-to-end, start static. Convert to dynamic only after the static model is proven and the cost of staleness is measured.
If the production system is already serving a static model and quality is decaying noticeably between releases, that is the signal to invest in dynamic training.

The choice is rarely binary in practice. Most large production systems run a portfolio of models with different cadences: a daily-batch model for stable signals, an hourly online model for fast-moving signals, and a per-request rerank step that uses the latest interaction context.

Modern relevance

In 2026, dynamic models are the default in any consumer-facing recommendation, advertising, or feed ranking application at scale. The continuing rise of short-form video platforms, real-time chat assistants, and personalized agentic interfaces has only increased the cost of staleness. At the same time, the tooling has matured: river, Flink ML, Vowpal Wabbit, and MOA cover the open-source side, while every major cloud provider offers a managed continuous training service.

The rise of large language models has not displaced dynamic modelling. Although foundation models themselves are usually trained statically (with periodic full retrains rather than continuous updates), the systems that use them often wrap them in online learning loops at the application layer. A search ranker that uses an LLM as a feature extractor is still a dynamic model in the sense that its ranker weights update continuously based on user interactions. Personalization layers, retrieval indexes, and post-training adapters all use online updates even when the base model is frozen.

The research frontier has moved toward neural online learning, online deep learning with replay and consolidation, online fine-tuning of LLMs, federated dynamic models that learn from decentralized data, and graph-based online learning for evolving social networks. The 2007 ADWIN paper, the 2003 Zinkevich paper, and the 2013 FTRL-Proximal paper remain the standard references for the underlying theory.

Explain like I'm 5

Imagine you have a robot that picks the best snack to give you each day. A regular robot is trained once, when it is first built. It learns that you like pretzels and apples and never changes its mind. After a few months it is still offering you pretzels even though you got tired of pretzels weeks ago.

A dynamic robot watches what you actually eat every day. If you start liking grapes, the dynamic robot notices and starts offering grapes too. If you stop liking pretzels, it stops offering pretzels. It is always learning, a little bit at a time, instead of being frozen forever after one big lesson.

The trade-off is that the dynamic robot is more work to take care of. Someone has to make sure it is not learning weird things, that the snacks are still real snacks, and that it does not suddenly forget you are allergic to peanuts. The plain robot is simpler but it gets boring. The dynamic robot keeps up with you, as long as someone keeps an eye on what it is learning.

References

Google. "Machine Learning Glossary: dynamic model." Google for Developers. https://developers.google.com/machine-learning/glossary
Google Cloud. "MLOps: Continuous delivery and automation pipelines in machine learning." Cloud Architecture Center. https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
Google. "Production ML systems: Static versus dynamic training." Machine Learning Crash Course. https://developers.google.com/machine-learning/crash-course/production-ml-systems/static-vs-dynamic-training
Wikipedia contributors. "Online machine learning." Wikipedia. https://en.wikipedia.org/wiki/Online_machine_learning
Huyen, C. (2022). *Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications*. O'Reilly Media. ISBN 978-1098107963.
Novikoff, A. B. J. (1962). "On convergence proofs on perceptrons." *Proceedings of the Symposium on the Mathematical Theory of Automata*, 12, 615-622.
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. (2006). "Online Passive-Aggressive Algorithms." *Journal of Machine Learning Research*, 7, 551-585. https://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf
Zinkevich, M. (2003). "Online Convex Programming and Generalized Infinitesimal Gradient Ascent." *Proceedings of the 20th International Conference on Machine Learning (ICML)*, 928-936. https://www.cs.cmu.edu/~maz/publications/techconvex.pdf
McMahan, H. B., Holt, G., Sculley, D., Young, M., Ebner, D., Grady, J., Nie, L., Phillips, T., Davydov, E., Golovin, D., Chikkerur, S., Liu, D., Wattenberg, M., Hrafnkelsson, A. M., Boulos, T., and Kubica, J. (2013). "Ad Click Prediction: a View from the Trenches." *Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 1222-1230. https://research.google.com/pubs/pub41159.html
Domingos, P., and Hulten, G. (2000). "Mining High-Speed Data Streams." *Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 71-80. https://homes.cs.washington.edu/~pedrod/papers/kdd00.pdf
Evidently AI. "What is data drift in ML, and how to detect and handle it." https://www.evidentlyai.com/ml-in-production/data-drift
Bifet, A., and Gavalda, R. (2007). "Learning from Time-Changing Data with Adaptive Windowing." *Proceedings of the 7th SIAM International Conference on Data Mining (SDM)*, 443-448. https://epubs.siam.org/doi/10.1137/1.9781611972771.42
Montiel, J., Halford, M., Mastelini, S. M., Bolmier, G., Sourty, R., Vaysse, R., Zouitine, A., Gomes, H. M., Read, J., Abdessalem, T., and Bifet, A. (2021). "River: machine learning for streaming data in Python." *Journal of Machine Learning Research*, 22(110), 1-8. https://www.jmlr.org/papers/v22/20-1380.html
Vowpal Wabbit Project Documentation. "Vowpal Wabbit Wiki." Microsoft Research / GitHub. https://github.com/VowpalWabbit/vowpal_wabbit/wiki
Bifet, A., Holmes, G., Kirkby, R., and Pfahringer, B. (2010). "MOA: Massive Online Analysis." *Journal of Machine Learning Research*, 11, 1601-1604. https://www.jmlr.org/papers/volume11/bifet10a/bifet10a.pdf
Apache Flink. "Apache Flink ML 2.2.0 Release Announcement." Apache Software Foundation, April 19, 2023. https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/
Apache Spark. "Linear Methods - RDD-based API." Apache Spark documentation. https://spark.apache.org/docs/latest/mllib-linear-methods.html
Liu, Z., Zou, L., Zou, X., Wang, C., Zhang, B., Tang, D., Zhu, B., Zhu, Y., Wu, P., Wang, K., Cheng, Y. (2022). "Monolith: Real Time Recommendation System With Collisionless Embedding Table." *arXiv preprint arXiv:2209.07663*. https://arxiv.org/abs/2209.07663
Gama, J., Medas, P., Castillo, G., and Rodrigues, P. (2004). "Learning with Drift Detection." *Advances in Artificial Intelligence (SBIA 2004)*, Lecture Notes in Computer Science, Vol. 3171, 286-295. https://link.springer.com/chapter/10.1007/978-3-540-28645-5_29
Baena-Garcia, M., del Campo-Avila, J., Fidalgo, R., Bifet, A., Gavalda, R., and Morales-Bueno, R. (2006). "Early Drift Detection Method." *Fourth International Workshop on Knowledge Discovery from Data Streams*.
scikit-learn developers. "Strategies to scale computationally: bigger data." scikit-learn documentation. https://scikit-learn.org/stable/computing/scaling_strategies.html
Cheng, H.-T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson, G., Corrado, G., Chai, W., Ispir, M., Anil, R., Haque, Z., Hong, L., Jain, V., Liu, X., and Shah, H. (2016). "Wide & Deep Learning for Recommender Systems." *Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (DLRS)*. https://arxiv.org/abs/1606.07792
Covington, P., Adams, J., and Sargin, E. (2016). "Deep Neural Networks for YouTube Recommendations." *Proceedings of the 10th ACM Conference on Recommender Systems (RecSys)*. https://research.google/pubs/pub45530/
Wikipedia contributors. "Concept drift." Wikipedia. https://en.wikipedia.org/wiki/Concept_drift
Wikipedia contributors. "Vowpal Wabbit." Wikipedia. https://en.wikipedia.org/wiki/Vowpal_Wabbit

Definition

Comparison to static models

Online learning algorithms

Stochastic gradient descent

Perceptron

Online passive-aggressive algorithms

Online gradient descent and online convex optimization

Follow the regularized leader

Recursive least squares

Hoeffding trees

Online passive learners and budget perceptrons

Concept drift and detection methods

Types of distribution shift

Drift detection algorithms

What detection triggers

Streaming machine learning frameworks

River

Vowpal Wabbit

MOA (Massive Online Analysis)

Apache Flink ML

Spark MLlib (streaming)

scikit-learn partial_fit

Production frameworks at hyperscalers

Use cases

Recommender systems

Online advertising and CTR prediction

Fraud detection

Algorithmic trading and dynamic pricing

News and short-video ranking

Spam filtering

Industrial monitoring and predictive maintenance

Production examples

Trade-offs

Operational cost

Label latency

Feedback loops

Validation difficulty

Catastrophic forgetting

Monitoring complexity

Reproducibility

Choosing between static and dynamic

Modern relevance

Explain like I'm 5

See also

References

Improve this article

Related Articles

ARC-AGI 2

Online learning

Pipeline

Pipelining

Training-Serving Skew

MLflow

Definition

Comparison to static models

Online learning algorithms

Stochastic gradient descent

Perceptron

Online passive-aggressive algorithms

Online gradient descent and online convex optimization

Follow the regularized leader

Recursive least squares

Hoeffding trees

Online passive learners and budget perceptrons

Concept drift and detection methods

Types of distribution shift

Drift detection algorithms

What detection triggers

Streaming machine learning frameworks

River

Vowpal Wabbit

MOA (Massive Online Analysis)

Apache Flink ML

Spark MLlib (streaming)

scikit-learn partial_fit

Production frameworks at hyperscalers

Use cases

Recommender systems

Online advertising and CTR prediction

Fraud detection