ARIMA

Machine Learning Statistics

30 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

40 citations

Revision

v3 · 6,093 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

ARIMA (Autoregressive Integrated Moving Average) is a class of statistical models for analyzing and forecasting time series data, specified by three non-negative integer orders written as ARIMA(p, d, q): p is the number of autoregressive lags, d is the order of differencing applied to make the series stationary, and q is the number of moving-average lags applied to past forecast errors. ARIMA was popularized by George E. P. Box and Gwilym M. Jenkins in their 1970 book Time Series Analysis: Forecasting and Control (Holden-Day, San Francisco), which supplied the complete model-fitting workflow now known as the Box-Jenkins methodology.^[1]

For most of the late twentieth century ARIMA was the default forecasting method in economics, finance, demand planning, and operations research. It still ships in every serious statistical package, it still wins on small clean univariate datasets, and it still appears as the baseline that every modern method has to beat. The M-competitions, run by Spyros Makridakis and colleagues since 1982, repeatedly found that simple statistical methods including ARIMA and exponential smoothing were hard to beat in practice, and that finding only began to flip when the M4 (2018, 100,000 series) and M5 (2020, 42,840 series) competitions saw hybrid and pure machine-learning entries take the top spots.^[23]^[24]^[25] In the era of deep learning forecasters and time-series foundation models, ARIMA is no longer state of the art, but it is still where most practitioners and most textbooks start.

Facts

Field	Value
Model class	Linear, parametric, univariate time-series model
Year formalized	1970 (Box & Jenkins, Time Series Analysis: Forecasting and Control)
Methodology	Box-Jenkins (identification, estimation, diagnostic checking)
Parameters	p (AR order), d (differencing order), q (MA order); seasonal extension adds (P, D, Q) at period s
Estimation	Conditional sum of squares or maximum likelihood, usually via Kalman filter on state-space form
Stationarity tests	Augmented Dickey-Fuller, KPSS, Phillips-Perron
Order selection	AIC, AICc, BIC, HQIC; Hyndman-Khandakar (2008) auto.arima algorithm
Key references	Box & Jenkins (1970, 1976, 1994 with Reinsel, 2015 with Reinsel & Ljung); Hyndman & Athanasopoulos Forecasting: Principles and Practice (otexts.com/fpp3)
Standard implementations	R `forecast::auto.arima`, `fable::ARIMA`; Python `statsmodels.SARIMAX`, `pmdarima.auto_arima`, `nixtla statsforecast`

What is ARIMA?

ARIMA is a linear, parametric model that forecasts a single time series from its own past. It is the union of three older ideas: an autoregressive (AR) component that regresses today on its own recent values, an integrated (I) component that differences the series to remove trends, and a moving-average (MA) component that corrects for recent forecast errors. The model carries no information from other variables unless you reach for the SARIMAX extension; in its base form it is purely univariate. Its enduring appeal is that the coefficients are interpretable, the forecasts come with well-defined confidence intervals, and the whole thing fits and predicts in milliseconds.

ELI5: explain like I'm 5

Imagine you are tracking the temperature outside every day. ARIMA is a recipe for guessing tomorrow's temperature using three ingredients. First, today's temperature usually looks a lot like yesterday's, so you give some weight to the last few days (that is the autoregressive or AR part). Second, weather has trends and you cannot just use the raw numbers; you look at how much the temperature changed from one day to the next, because changes are more stable than absolute levels (that is the integrated or I part, called differencing). Third, you remember how wrong your guesses were on the last few days and you correct for those mistakes (that is the moving-average or MA part).

You pick three small numbers that say how many days back you want to look for each of the three ingredients. The computer then finds the best weights to combine them. After that, you can ask the model what tomorrow looks like, or next week, or next month. The further out you go, the more uncertain the forecast gets, which is why the prediction comes with a fan of error bars rather than a single line.

What do the p, d, and q parameters mean?

The three orders are the heart of the notation ARIMA(p, d, q):

Order	Component	What it counts
`p`	Autoregressive (AR)	Number of lagged values of the series used as predictors
`d`	Integrated (I)	Number of times the series is differenced to reach stationarity
`q`	Moving average (MA)	Number of lagged forecast errors used as predictors

Most series in practice need d = 0 or d = 1; d = 2 shows up for things like cumulative log-prices but is rare, and higher orders over-difference the series and inflate variance.^[7] A few named special cases make the notation concrete: ARIMA(0, 0, 0) is white noise, ARIMA(0, 1, 0) is a random walk, ARIMA(0, 1, 1) is equivalent to simple exponential smoothing, and ARIMA(0, 2, 2) is equivalent to Holt's linear method.

When was ARIMA developed?

The components of ARIMA are older than the name. Autoregressive models go back to George Udny Yule's 1927 paper on sunspot numbers,^[10] with Gilbert Walker's 1931 paper on the Indian monsoon adding the higher-order AR formulation now associated with the Yule-Walker equations.^[11] Eugen Slutsky's 1927 paper showed that random shocks summed in a moving-average pattern can produce convincing-looking cycles, the result that Slutsky later expanded in his 1937 Econometrica article.^[12] Herman Wold's 1938 doctoral thesis proved a decomposition theorem that any covariance-stationary process can be written as a sum of a deterministic part and an infinite moving average of uncorrelated shocks, which is why ARMA representations are so general.^[9] Differencing as a way to handle nonstationary series was already standard practice in econometrics by the 1950s.

The contribution of Box and Jenkins was to package these ingredients into a single, repeatable procedure. Their 1970 book laid out an iterative three-stage workflow (identification, estimation, diagnostic checking), introduced the seasonal extension SARIMA, and supplied worked examples on series like airline passengers, IBM stock prices, and gas furnace data that became standard datasets for decades.^[1] The book has gone through five editions: the original Holden-Day edition in 1970,^[1] a revised printing in 1976,^[2] a third edition with Gregory Reinsel in 1994 (Prentice Hall),^[3] a fourth edition in 2008, and the fifth edition (Wiley, 2015) co-authored with Reinsel and Greta Ljung.^[4] Box and Jenkins were not the only people working on this in the late 1960s, but their book made the methodology teachable and reproducible, and the term Box-Jenkins models became more or less synonymous with ARIMA in industry.

The 1980s and 1990s extended ARIMA in several directions: vector versions for multivariate series (Tiao and Box, 1981), fractional differencing for long-memory processes (Granger and Joyeux 1980; Hosking 1981),^[13]^[14] volatility extensions through GARCH (Engle 1982; Bollerslev 1986),^[15]^[16] and nonlinear cousins like threshold AR (Tong 1978) and smooth transition AR.^[18] State-space representations, championed by Harvey (1989) and others, showed that almost every ARIMA model could be cast as a Kalman filter problem, which is how most modern software actually fits them.^[19]

The Hyndman-Khandakar algorithm published in 2008 (the engine behind R's auto.arima in the forecast package by Rob J. Hyndman) automated the order-selection step that had previously required a human looking at autocorrelation plots.^[8] That is the version of ARIMA most people actually use today.

How is ARIMA defined mathematically?

ARIMA builds up from three simpler models. Throughout, y_t denotes the observed value at time t, c is a constant, and ε_t is white noise (uncorrelated with mean zero and constant variance).

Autoregressive model AR(p)

An autoregressive process of order p regresses the current value on its own previous p values:

y_t = c + φ_1 y_{t-1} + φ_2 y_{t-2} + ... + φ_p y_{t-p} + ε_t

The coefficients φ_1, ..., φ_p are estimated from data. AR(1) is a simple persistence model where today depends on yesterday; AR(2) can produce damped oscillations; higher orders can fit richer dynamics. For the process to be stationary, the roots of the characteristic polynomial 1 - φ_1 z - ... - φ_p z^p must lie outside the unit circle.

Moving-average model MA(q)

A moving-average process of order q regresses the current value on its own previous q forecast errors:

y_t = c + ε_t + θ_1 ε_{t-1} + θ_2 ε_{t-2} + ... + θ_q ε_{t-q}

This is not the same thing as a moving-average filter on the data; it is a regression on past shocks. The coefficients θ_1, ..., θ_q are estimated, and the process is invertible (the shocks can be recovered from the observations) when the roots of 1 + θ_1 z + ... + θ_q z^q lie outside the unit circle. Note that the sign convention varies across textbooks and software; some authors write the MA terms with minus signs.

ARMA(p, q)

Combining the two:

y_t = c + φ_1 y_{t-1} + ... + φ_p y_{t-p} + ε_t + θ_1 ε_{t-1} + ... + θ_q ε_{t-q}

ARMA models are appropriate for stationary series. By Wold's decomposition theorem, any covariance-stationary process can be approximated arbitrarily well by an ARMA model with sufficiently large p and q, which is the theoretical justification for the whole framework.^[9]

ARIMA(p, d, q)

If the series is not stationary, you difference it d times until it is, then fit an ARMA model to the differenced series. Differencing once means working with Δy_t = y_t - y_{t-1}. Differencing twice means working with Δ²y_t = Δy_t - Δy_{t-1}. Using the lag operator L (defined by L y_t = y_{t-1}), the ARIMA(p, d, q) model can be written compactly as:

(1 - φ_1 L - ... - φ_p L^p)(1 - L)^d y_t = c + (1 + θ_1 L + ... + θ_q L^q) ε_t

or in shorter notation:

φ(L) (1 - L)^d y_t = c + θ(L) ε_t

where φ(L) = 1 - φ_1 L - ... - φ_p L^p and θ(L) = 1 + θ_1 L + ... + θ_q L^q. Most series in practice need d = 0 or d = 1; d = 2 shows up for things like cumulative log-prices but is rare. Higher orders of d over-difference the series and inflate variance.

Model variants

Special cases and extensions of ARIMA include:

Notation	Meaning
AR(p)	Pure autoregressive on stationary series
MA(q)	Pure moving average
ARMA(p, q)	Mixed AR and MA on stationary series
ARIMA(0, 0, 0)	White noise
ARIMA(0, 1, 0)	Random walk
ARIMA(0, 1, 0) with drift	Random walk with drift
ARIMA(p, 0, 0)	Pure AR(p) on a stationary series
ARIMA(0, 0, q)	Pure MA(q)
ARIMA(0, 1, 1)	Equivalent to simple exponential smoothing
ARIMA(0, 2, 2)	Equivalent to Holt's linear method
SARIMA(p, d, q)(P, D, Q)_s	Seasonal ARIMA with seasonal period s
ARIMAX	ARIMA with exogenous regressors X
SARIMAX	Seasonal ARIMA with exogenous regressors
VAR(p)	Vector autoregression for multiple series
VARMA(p, q)	Vector ARMA
VARIMA(p, d, q)	Vector ARIMA
ARFIMA(p, d, q)	Fractional ARIMA, real-valued d
ARIMA-GARCH	ARIMA mean equation, GARCH conditional variance

What is seasonal ARIMA (SARIMA)?

Many real-world series have seasonal patterns: daily traffic peaks in the morning rush, retail sales spike in December, electricity load follows weekly cycles. Seasonal ARIMA, written SARIMA(p, d, q)(P, D, Q)_s, adds a seasonal block of orders (P, D, Q) at lag s (the seasonal period, for example 12 for monthly data with annual seasonality). The seasonal extension was introduced by Box and Jenkins in the original 1970 book.^[1] The full model is:

Φ(L^s) φ(L) (1 - L)^d (1 - L^s)^D y_t = c + Θ(L^s) θ(L) ε_t

Here Φ(L^s) and Θ(L^s) are the seasonal AR and MA polynomials in the seasonal lag operator. SARIMA can capture combined non-seasonal and seasonal autocorrelation but only one seasonality at a time. Hourly electricity data has at least three (daily, weekly, yearly), and that is one of the cases where ARIMA-style models start to creak.

ARIMAX and SARIMAX

ARIMAX adds external predictor variables (the X stands for exogenous) to a non-seasonal ARIMA model. SARIMAX is the seasonal version. These let the forecaster include things like price, advertising spend, holiday flags, or weather covariates. In the standard SARIMAX formulation, the response is regressed on the exogenous variables and ARIMA is fit to the residuals, giving:

y_t = β'x_t + n_t,    where n_t follows ARIMA(p, d, q)

In statsmodels and many other packages, SARIMAX is the most general estimator and the plain ARIMA classes are special cases.^[38]

What is the Box-Jenkins methodology?

Box and Jenkins prescribed a three-step iterative procedure.^[1] The steps are usually run by hand on small datasets and automated on large ones.

1. Identification

The goal of identification is to pick (p, d, q) (and the seasonal orders if relevant). Standard tools:

Plot the series. Look for trend, seasonality, structural breaks, obvious outliers.
Apply unit-root tests to decide d. The augmented Dickey-Fuller (ADF) test has a null of unit root;^[21] the KPSS test (Kwiatkowski-Phillips-Schmidt-Shin) has a null of stationarity.^[22] Running both is standard because they answer the question from opposite sides. The Phillips-Perron test is a robust variant of ADF.
Difference the series until it looks stationary, then examine the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the differenced series.
A pure AR(p) typically has an ACF that decays geometrically and a PACF that cuts off after lag p. A pure MA(q) shows the opposite pattern. Mixed ARMA models show decays in both. In practice these clean patterns rarely appear and the ACF/PACF are used as suggestions rather than proof.

For seasonal data, the ACF and PACF are also examined at the seasonal lags to set (P, D, Q).

2. Estimation

Given (p, d, q), the coefficients are estimated from the differenced series. The two main objective functions are conditional sum of squares (CSS) and full maximum likelihood estimation (MLE). Most modern software writes the model in state-space form and runs the Kalman filter to evaluate the likelihood, which handles missing data and unobserved components naturally.^[19] Standard errors come from the inverse Hessian or from outer-product-of-gradient (OPG) approximations.

3. Diagnostic checking

After fitting, the residuals should look like white noise. Standard checks:

ACF of residuals: bars should mostly stay inside the 95% confidence band.
Ljung-Box portmanteau test for joint autocorrelation up to lag h.^[20]
Normal Q-Q plot of residuals.
Plot of standardized residuals over time, looking for changing variance (a sign that you might need a GARCH layer).

If the residuals fail any of these, you go back to identification, change the orders, refit, and recheck. The whole loop is iterative, which is why Box and Jenkins drew it as a flowchart with arrows pointing back to step 1.^[1]

How does stationarity and differencing work in ARIMA?

ARIMA assumes weak (covariance) stationarity of the differenced series: constant mean, constant variance, and an autocovariance that depends only on the lag. Most economic and financial series are visibly nonstationary, with trends, drifts, or shifting variances. Two common transformations:

Differencing handles trend and unit roots. One difference removes a linear trend; two removes a quadratic. Seasonal differencing (y_t - y_{t-s}) handles seasonal trends.
Logarithms or Box-Cox transformations stabilize variance when the series shows multiplicative behavior (variance growing with level).

Over-differencing is a real risk and inflates the variance of the differenced series, often introducing a spurious negative autocorrelation at lag 1. A common heuristic is to choose the smallest d that makes the unit-root test fail to reject stationarity. Hyndman and Khandakar's auto.arima implementation uses successive KPSS tests for the non-seasonal d and successive Canova-Hansen or OCSB tests for the seasonal D.^[8]

The ARIMA framework cannot handle structural breaks or regime changes natively. If the data-generating process changes (a new tax regime, a pandemic, a policy shift), refitting on a window after the break is the usual workaround.

Stationarity tests at a glance

Test	Null hypothesis	Notes
Augmented Dickey-Fuller (ADF)	Series has a unit root (nonstationary)	Lag selection by AIC or BIC; multiple variants for trend and intercept
KPSS	Series is trend or level stationary	Run alongside ADF; agreement gives more confidence
Phillips-Perron (PP)	Unit root	Nonparametric correction for serial correlation and heteroskedasticity
Canova-Hansen	Stationarity at the seasonal lag	Used for seasonal differencing decisions
OCSB (Osborn-Chui-Smith-Birchenhall)	No seasonal unit root	Default in fable's ARIMA

How do you choose the ARIMA model order?

With manual identification you pick orders by reading ACF/PACF plots. With automated selection you pick orders by minimizing an information criterion on a grid of candidates. The standard criteria are:

Criterion	Formula	Behavior
AIC (Akaike)	`-2 log L + 2k`	Picks slightly larger models, asymptotically efficient for prediction
AICc (corrected AIC)	AIC plus a small-sample bias correction	Recommended when n/k < 40
BIC (Schwarz)	`-2 log L + k log n`	Penalizes complexity more heavily, consistent for true model order
HQIC (Hannan-Quinn)	`-2 log L + 2k log log n`	A compromise

Here L is the maximized likelihood, k the number of parameters, and n the sample size. None of the criteria is universally best; they encode different priors about complexity. Cross-validation on rolling forecast windows is more expensive but tends to give better out-of-sample performance.

What is auto.arima and the Hyndman-Khandakar algorithm?

Rob J. Hyndman and Yeasmin Khandakar published a step-wise search algorithm in 2008 in the Journal of Statistical Software (volume 27, issue 3, pages 1-22) that combines unit-root testing for d and D with a fast greedy search over (p, q, P, Q) minimizing AICc.^[8] It starts from a small set of seed models, perturbs the orders one at a time, and keeps the best until no neighbor improves. The implementation in the R forecast package became the de-facto reference for automated ARIMA fitting^[39] and was ported to Python as pmdarima.^[40]

auto.arima is not a replacement for thinking. It can pick weird models on noisy data, it does not know about structural breaks, and its default settings impose a maximum on the orders that the user can change. But for routine forecasting on hundreds of series, it is hard to beat in terms of effort-per-forecast.

What are the main variants and extensions of ARIMA?

ARFIMA (fractional ARIMA)

Granger and Joyeux (1980) and Hosking (1981) generalized differencing to fractional orders, allowing d to take any real value (typically 0 < d < 0.5 for stationarity).^[13]^[14] ARFIMA models long-memory processes whose autocorrelation decays as a power law rather than exponentially. They show up in hydrology, network traffic, and some financial volatility series.

VAR and VARIMA

Vector autoregression (VAR), popularized by Christopher Sims's 1980 paper Macroeconomics and Reality, generalizes AR(p) to multiple series at once: each variable is regressed on lags of itself and all the others.^[17] VARIMA adds differencing, but in practice the multivariate world has been dominated by VAR rather than VARIMA, partly because identifying multivariate MA components is harder and partly because VAR has a clean interpretation in terms of impulse-response functions.

Cointegration, the Nobel-winning Engle-Granger and Johansen frameworks, sits on top of VAR. When several nonstationary series share a common stochastic trend, you can model their differences and the long-run equilibrium together in a vector error-correction model (VECM).

GARCH and ARIMA-GARCH

ARIMA models the conditional mean and assumes constant variance. Robert Engle's 1982 ARCH model (Nobel Prize 2003) and Tim Bollerslev's 1986 generalization to GARCH model the conditional variance, which is essential for financial returns.^[15]^[16] A common pipeline is to fit ARIMA to the mean equation and GARCH to the residuals, giving an ARIMA-GARCH composite. See GARCH.

Threshold and nonlinear AR

Howard Tong's 1978 threshold autoregressive (TAR) model lets the AR coefficients switch between regimes depending on whether a state variable crosses a threshold.^[18] Smooth transition AR (STAR) replaces the hard switch with a logistic-like function. These nonlinear cousins of ARIMA capture asymmetries that linear models cannot, but they need more data and are harder to identify.

State-space and unobserved components

Almost every ARIMA model has an equivalent state-space representation. Andrew Harvey's Forecasting, Structural Time Series Models and the Kalman Filter (1989) reformulated the field around the state-space view, which makes irregular time stamps, missing data, time-varying coefficients, and exogenous regressors much easier to handle.^[19] The Bayesian counterpart is BSTS (Bayesian Structural Time Series), used at Google for causal impact analysis.

What software implements ARIMA?

Package	Language	Notes
`forecast` (`auto.arima`, `Arima`)	R	Hyndman-Khandakar reference implementation
`fable`	R	Successor to forecast in the tidyverts
`statsmodels.tsa.arima.model.ARIMA`	Python	Standard ARIMA class in statsmodels
`statsmodels.tsa.statespace.SARIMAX`	Python	More general state-space estimator
`pmdarima.auto_arima`	Python	Port of auto.arima
`darts`	Python	Wraps statsmodels and exposes ARIMA alongside neural forecasters
`nixtla statsforecast`	Python	Fast vectorized AutoARIMA for parallel fitting
`prophet` (Stan backend)	Python / R	Not strictly ARIMA but the most common alternative for the same problems
`TimeSeries.jl`, `StateSpaceModels.jl`	Julia	Julia ecosystem ARIMA tooling
PROC ARIMA	SAS	Used heavily in pharma and finance
EViews ARIMA / SARIMAX	EViews	Standard in academic econometrics
`arima` command	Stata	Estimation by ML or CSS
`forecast::Arima`, `prophet`, `fpp3` book code	R / book	Hyndman and Athanasopoulos's open-access Forecasting: Principles and Practice uses fable throughout

Most of these compute the likelihood with a Kalman filter on the state-space form, which means they handle missing observations and irregular data naturally.^[19] The coefficient estimates and standard errors should agree across packages to several decimal places when the optimizer converges, but the default starting values, convergence tolerances, and order-selection heuristics differ.

How does ARIMA compare to modern deep-learning forecasters?

The last decade brought a wave of neural and foundation-model forecasters. They sit at different points on the bias-variance and effort-to-deploy trade-offs.

Method	Year	Type	Multivariate	Probabilistic	Notes
ARIMA	1970	Linear, parametric	No (use VAR or SARIMAX)	Gaussian residuals	Fast, interpretable, baseline
Exponential smoothing (ETS)	1957-1985	Linear, parametric	No	Gaussian residuals	Often tied with ARIMA in M-competitions
Theta method	2000	Decomposition	No	Gaussian residuals	Won M3; surprisingly hard to beat
Prophet	2017	Additive components	No	Bayesian intervals	Built at Facebook for business series with multiple seasonalities
DeepAR	2017	Autoregressive RNN	Cross-series	Quantile or Gaussian	Built at Amazon, trains across many related series; uses LSTM
N-BEATS	2019	Residual MLP stack	No	Point forecasts	Yoshua Bengio's group at Element AI; outperformed M4 winner
Temporal Fusion Transformer	2019	Attention model	Yes	Quantile	Google Brain, handles known and unknown future inputs
N-HiTS	2022	Hierarchical interpolation	No	Point forecasts	Faster N-BEATS variant
TimesNet	2023	2D inception	Yes	Point forecasts	Tsinghua group
Lag-Llama	2023	Decoder-only transformer	No	Probabilistic	Open foundation model trained on diverse series
TimeGPT	2023	Encoder-decoder transformer	Yes	Quantile	Nixtla; first commercial time-series foundation model API
Chronos	2024	T5 over tokenized values	No	Probabilistic	Built at Amazon, zero-shot
Moirai	2024	Masked encoder	Yes	Probabilistic	Salesforce, any-frequency any-variate
TimesFM	2024	Decoder transformer	No	Point forecasts	Built at Google, zero-shot
Time-LLM	2024	Reprogrammed LLM	No	Point forecasts	Wraps a frozen LLM with a series adapter

A few honest comparisons:

On a single short clean univariate series (a few hundred to a few thousand points), a well-chosen ARIMA or ETS often matches or beats neural methods. The bias-variance trade-off favors low-variance estimators when there is little data to estimate from.
On large panels of related series (thousands of SKUs, retail stores, energy meters), neural models that share parameters across series usually win. DeepAR was the first published method to demonstrate this clearly.^[26] ARIMA fits each series independently and cannot borrow strength.
On series with multiple seasonalities, holiday effects, and known future regressors, methods like Prophet, TFT, and N-BEATS-X tend to handle the structure more naturally than SARIMAX without extra feature engineering.^[28]^[29]
Foundation models (Chronos, Moirai, TimesFM, TimeGPT, Lag-Llama) offer something ARIMA cannot: zero-shot forecasting on a new series with no fitting at all.^[30]^[31]^[32]^[33]^[34] The trade-off is that they are large, slow, opaque, and sometimes wrong in ways that are hard to diagnose.
Hybrid models combining ARIMA with LSTM or other neural networks on the residuals (Zhang 2003 was an early proposal) sometimes outperform either component alone on series with a smooth linear trend plus nonlinear noise.^[36]

What do the M-competitions say about ARIMA?

The M-competitions, organized by Spyros Makridakis since 1982, are large empirical forecasting bake-offs. They are the closest thing the field has to a leaderboard.

Competition	Year	Series	Winner	ARIMA finish
M (M1)	1982	1,001	Simple methods	Mid-pack; no clear ARIMA advantage
M2	1993	29	Mixed	Mixed
M3	2000	3,003	Theta method (later beaten by ForecastPro)	ARIMA competitive on yearly series, weaker on monthly
M4	2018	100,000	Slawek Smyl's hybrid ES-RNN (Uber)	Pure ARIMA below ensemble baselines, still beat many ML methods
M5	2020	42,840 (Walmart)	LightGBM-based gradient boosting ensembles	ARIMA in the bottom half
M6	2022	50 (financial)	Forecasting plus investment	Not the focus

The M3 paper (Makridakis and Hibon, 2000) became famous for the conclusion that "statistically sophisticated or complex methods do not necessarily produce more accurate forecasts than simpler ones," which was a hard sell for machine-learning advocates at the time.^[23] The M4 paper (2018) flipped this: the winning ES-RNN was roughly 10% more accurate than the competition's combination benchmark by sMAPE, and 12 of the 17 most accurate methods were combinations of statistical and machine-learning approaches, with the pure-statistical baselines coming in below them.^[24] The M5 result was even more decisive for machine learning, with gradient-boosting ensembles dominating its 42,840 Walmart series.^[25] Even so, the M5 organizers noted that the simple combination benchmark (a mix of statistical methods including ARIMA and ETS) beat the majority of the submitted entries.^[25]

When should you use ARIMA?

Despite the press around foundation models, ARIMA holds up in several settings:

Small datasets. With a few dozen to a few hundred observations and no related series to borrow from, low-parameter linear models are often the best you can do. Neural models overfit on small data.
Univariate series with clean structure. A monthly series with a single annual seasonality and a few hundred points is the ideal SARIMA target.
Interpretable forecasts. Coefficients have a meaning, confidence intervals are well-defined, and the diagnostic plots tell you when the model is wrong.
Latency-sensitive deployment. ARIMA fits and predicts in milliseconds and ships in a few KB. Foundation models are often gigabytes.
Regulatory or audit settings. In pharma trial monitoring, financial risk reporting, or any setting where you have to explain the model to a regulator, the linear-Gaussian story is much easier than "a transformer trained on five billion time points."
As a baseline. Even when it loses, ARIMA is the reference that any new method has to clear. If a new model does not beat auto.arima, it has not yet proven anything.

What are the limitations of ARIMA?

ARIMA's weaknesses are essentially the inverse of its strengths:

Linear and Gaussian. Real series often have heavy tails, asymmetries, and nonlinear dynamics that ARIMA cannot capture. The standard answer is to wrap a GARCH layer for volatility or move to TAR / STAR for nonlinearity.
Single seasonality. SARIMA only handles one period at a time. Hourly electricity load needs at least daily, weekly, and yearly cycles, which is awkward in SARIMA but natural in TBATS, Prophet, or neural models.
Univariate by default. Adding exogenous variables through SARIMAX works, but cross-series learning (where the model trains on a panel of related series and shares parameters) is not part of the framework. DeepAR and global neural models are built for that.^[26]
Stationarity assumption. The differenced series has to be stationary. Series with structural breaks, regime changes, or evolving variance violate this. In practice you fit on a window after the most recent break.
Long-range dependence. ARIMA's autocorrelation decays exponentially, which is too fast for long-memory series. ARFIMA fixes this at the cost of harder estimation.^[14]
Order selection is fragile. ACF/PACF identification is more art than science, and auto.arima can pick odd orders on noisy data. Slightly different (p, d, q) choices can give noticeably different forecasts.
Forecast uncertainty grows with horizon. The confidence intervals widen quickly for long horizons, often to the point of uselessness. This is honest but unhelpful when stakeholders want a single number.
Manual tuning effort. Even with auto.arima, tuning seasonal orders, exogenous regressors, and Box-Cox transformations on a new dataset is not point-and-click. Foundation models bypass that step entirely at the cost of opacity.

What is ARIMA used for?

ARIMA still has a long tail of working deployments:

Demand forecasting in retail and supply chain, where it competes with gradient-boosted trees and neural models.
Macroeconomic forecasting at central banks, often inside larger structural or VAR models.
Inventory planning, where the classical Box-Jenkins workflow is taught in operations research courses.
Health-care monitoring, including hospital admissions, drug demand, and disease incidence.
Energy load forecasting at short horizons, where it shows up in ensembles.
Network traffic engineering, especially where ARFIMA's long-memory variants matter.
Climate and weather as a baseline, although numerical weather prediction and now graph neural network models like GraphCast dominate the operational picture.

It is also still the most-taught time-series method in MBA programs, statistics departments, and engineering curricula, which means most working analysts have an ARIMA model in their toolkit even if they reach for something else first.

References

Box, G. E. P., & Jenkins, G. M. (1970). *Time Series Analysis: Forecasting and Control*. Holden-Day, San Francisco. ↩
Box, G. E. P., & Jenkins, G. M. (1976). *Time Series Analysis: Forecasting and Control* (revised ed.). Holden-Day. ↩
Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). *Time Series Analysis: Forecasting and Control* (3rd ed.). Prentice Hall. ↩
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). *Time Series Analysis: Forecasting and Control* (5th ed.). Wiley. ↩
Brockwell, P. J., & Davis, R. A. (1991). *Time Series: Theory and Methods* (2nd ed.). Springer.
Hamilton, J. D. (1994). *Time Series Analysis*. Princeton University Press.
Hyndman, R. J., & Athanasopoulos, G. (2021). *Forecasting: Principles and Practice* (3rd ed.). OTexts. https://otexts.com/fpp3/ ↩
Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: the forecast package for R. *Journal of Statistical Software*, 27(3), 1-22. https://doi.org/10.18637/jss.v027.i03 ↩
Wold, H. (1938). *A Study in the Analysis of Stationary Time Series*. Almqvist & Wiksell. ↩
Yule, G. U. (1927). On a method of investigating periodicities in disturbed series, with special reference to Wolfer's sunspot numbers. *Philosophical Transactions of the Royal Society A*, 226, 267-298. ↩
Walker, G. T. (1931). On periodicity in series of related terms. *Proceedings of the Royal Society A*, 131(818), 518-532. ↩
Slutsky, E. (1937). The summation of random causes as the source of cyclic processes. *Econometrica*, 5(2), 105-146. ↩
Granger, C. W. J., & Joyeux, R. (1980). An introduction to long-memory time series models and fractional differencing. *Journal of Time Series Analysis*, 1(1), 15-29. ↩
Hosking, J. R. M. (1981). Fractional differencing. *Biometrika*, 68(1), 165-176. ↩
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. *Econometrica*, 50(4), 987-1007. ↩
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. *Journal of Econometrics*, 31(3), 307-327. ↩
Sims, C. A. (1980). Macroeconomics and reality. *Econometrica*, 48(1), 1-48. ↩
Tong, H. (1978). On a threshold model. In C. H. Chen (ed.), *Pattern Recognition and Signal Processing*. Sijthoff & Noordhoff. ↩
Harvey, A. C. (1989). *Forecasting, Structural Time Series Models and the Kalman Filter*. Cambridge University Press. ↩
Ljung, G. M., & Box, G. E. P. (1978). On a measure of lack of fit in time series models. *Biometrika*, 65(2), 297-303. ↩
Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. *Journal of the American Statistical Association*, 74(366), 427-431. ↩
Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. *Journal of Econometrics*, 54(1-3), 159-178. ↩
Makridakis, S., & Hibon, M. (2000). The M3-Competition: results, conclusions and implications. *International Journal of Forecasting*, 16(4), 451-476. ↩
Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2018). The M4 Competition: results, findings, conclusion and way forward. *International Journal of Forecasting*, 34(4), 802-808. ↩
Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2022). M5 accuracy competition: results, findings, and conclusions. *International Journal of Forecasting*, 38(4), 1346-1364. ↩
Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2020). DeepAR: probabilistic forecasting with autoregressive recurrent networks. *International Journal of Forecasting*, 36(3), 1181-1191. ↩
Oreshkin, B. N., Carpov, D., Chapados, N., & Bengio, Y. (2020). N-BEATS: neural basis expansion analysis for interpretable time series forecasting. *International Conference on Learning Representations*.
Lim, B., Arik, S. O., Loeff, N., & Pfister, T. (2021). Temporal fusion transformers for interpretable multi-horizon time series forecasting. *International Journal of Forecasting*, 37(4), 1748-1764. ↩
Taylor, S. J., & Letham, B. (2018). Forecasting at scale. *The American Statistician*, 72(1), 37-45. ↩
Garza, A., Mergenthaler-Canseco, M., & Challu, C. (2023). TimeGPT-1. *Nixtla preprint*. https://docs.nixtla.io/ ↩
Ansari, A. F., Stella, L., Turkmen, C., Zhang, X., et al. (2024). Chronos: learning the language of time series. *Amazon Science*. ↩
Das, A., Kong, W., Sen, R., & Zhou, Y. (2024). A decoder-only foundation model for time-series forecasting. *International Conference on Machine Learning*. ↩
Woo, G., Liu, C., Kumar, A., Xiong, C., et al. (2024). Unified training of universal time series forecasting transformers (Moirai). *International Conference on Machine Learning*. ↩
Rasul, K., Ashok, A., Williams, A. R., Khorasani, A., et al. (2023). Lag-Llama: towards foundation models for time series forecasting. *NeurIPS Time Series Workshop*. ↩
Jin, M., Wang, S., Ma, L., Chu, Z., et al. (2024). Time-LLM: time series forecasting by reprogramming large language models. *International Conference on Learning Representations*.
Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. *Neurocomputing*, 50, 159-175. ↩
Assimakopoulos, V., & Nikolopoulos, K. (2000). The theta model: a decomposition approach to forecasting. *International Journal of Forecasting*, 16(4), 521-530.
statsmodels documentation: ARIMA and SARIMAX. https://www.statsmodels.org/ ↩
R `forecast` package documentation: auto.arima. https://pkg.robjhyndman.com/forecast/ ↩
pmdarima documentation: auto_arima. https://alkaline-ml.com/pmdarima/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

R (programming language)Stationarity Temporal data Time Series Analysis

Facts

What is ARIMA?

ELI5: explain like I'm 5

What do the p, d, and q parameters mean?

When was ARIMA developed?

How is ARIMA defined mathematically?

Autoregressive model AR(p)

Moving-average model MA(q)

ARMA(p, q)

ARIMA(p, d, q)

Model variants

What is seasonal ARIMA (SARIMA)?

ARIMAX and SARIMAX

What is the Box-Jenkins methodology?

1. Identification

2. Estimation

3. Diagnostic checking

How does stationarity and differencing work in ARIMA?

Stationarity tests at a glance

How do you choose the ARIMA model order?

What is auto.arima and the Hyndman-Khandakar algorithm?

What are the main variants and extensions of ARIMA?

ARFIMA (fractional ARIMA)

VAR and VARIMA

GARCH and ARIMA-GARCH

Threshold and nonlinear AR

State-space and unobserved components

What software implements ARIMA?

How does ARIMA compare to modern deep-learning forecasters?

What do the M-competitions say about ARIMA?

When should you use ARIMA?

What are the limitations of ARIMA?

What is ARIMA used for?

See also

References

Improve this article

Related Articles

A/B Testing

Generalized Linear Model

L1 Loss

L2 Loss

Squared Loss

Stationarity

What links here

Related Articles

A/B Testing

Generalized Linear Model

L1 Loss

L2 Loss

Squared Loss

Stationarity

What links here