ARIMA
Last reviewed
May 2, 2026
Sources
40 citations
Review status
Source-backed
Revision
v2 · 5,735 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 2, 2026
Sources
40 citations
Review status
Source-backed
Revision
v2 · 5,735 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Time series, Time series analysis, Forecasting, Stationarity, Linear regression
ARIMA (Autoregressive Integrated Moving Average) is a class of statistical models for analyzing and forecasting time series data. The model is specified by three non-negative integer orders written as ARIMA(p, d, q): p is the number of autoregressive lags, d is the order of differencing applied to make the series stationary, and q is the number of moving-average lags applied to past forecast errors. ARIMA was popularized by George E. P. Box and Gwilym M. Jenkins in their 1970 book Time Series Analysis: Forecasting and Control, which gave the field a complete workflow for fitting these models and which is why the procedure is still called the Box-Jenkins methodology.
For most of the late twentieth century ARIMA was the default forecasting method in economics, finance, demand planning, and operations research. It still ships in every serious statistical package, it still wins on small clean univariate datasets, and it still appears as the baseline that every modern method has to beat. The M-competitions, run by Spyros Makridakis and colleagues since 1982, repeatedly found that simple statistical methods including ARIMA and exponential smoothing were hard to beat in practice, and that finding only began to flip when the M4 (2018) and M5 (2020) competitions saw hybrid and pure machine-learning entries take the top spots. In the era of deep learning forecasters and time-series foundation models, ARIMA is no longer state of the art, but it is still where most practitioners and most textbooks start.
| Field | Value |
|---|---|
| Model class | Linear, parametric, univariate time-series model |
| Year formalized | 1970 (Box & Jenkins, Time Series Analysis: Forecasting and Control) |
| Methodology | Box-Jenkins (identification, estimation, diagnostic checking) |
| Parameters | p (AR order), d (differencing order), q (MA order); seasonal extension adds (P, D, Q) at period s |
| Estimation | Conditional sum of squares or maximum likelihood, usually via Kalman filter on state-space form |
| Stationarity tests | Augmented Dickey-Fuller, KPSS, Phillips-Perron |
| Order selection | AIC, AICc, BIC, HQIC; Hyndman-Khandakar (2008) auto.arima algorithm |
| Key references | Box & Jenkins (1970, 1976, 1994 with Reinsel, 2015 with Reinsel & Ljung); Hyndman & Athanasopoulos Forecasting: Principles and Practice (otexts.com/fpp3) |
| Standard implementations | R forecast::auto.arima, fable::ARIMA; Python statsmodels.SARIMAX, pmdarima.auto_arima, nixtla statsforecast |
Imagine you are tracking the temperature outside every day. ARIMA is a recipe for guessing tomorrow's temperature using three ingredients. First, today's temperature usually looks a lot like yesterday's, so you give some weight to the last few days (that is the autoregressive or AR part). Second, weather has trends and you cannot just use the raw numbers; you look at how much the temperature changed from one day to the next, because changes are more stable than absolute levels (that is the integrated or I part, called differencing). Third, you remember how wrong your guesses were on the last few days and you correct for those mistakes (that is the moving-average or MA part).
You pick three small numbers that say how many days back you want to look for each of the three ingredients. The computer then finds the best weights to combine them. After that, you can ask the model what tomorrow looks like, or next week, or next month. The further out you go, the more uncertain the forecast gets, which is why the prediction comes with a fan of error bars rather than a single line.
The components of ARIMA are older than the name. Autoregressive models go back to George Udny Yule's 1927 paper on sunspot numbers, with Gilbert Walker's 1931 paper on the Indian monsoon adding the higher-order AR formulation now associated with the Yule-Walker equations. Eugen Slutsky's 1927 paper showed that random shocks summed in a moving-average pattern can produce convincing-looking cycles, the result that Slutsky later expanded in his 1937 Econometrica article. Herman Wold's 1938 doctoral thesis proved a decomposition theorem that any covariance-stationary process can be written as a sum of a deterministic part and an infinite moving average of uncorrelated shocks, which is why ARMA representations are so general. Differencing as a way to handle nonstationary series was already standard practice in econometrics by the 1950s.
The contribution of Box and Jenkins was to package these ingredients into a single, repeatable procedure. Their 1970 book laid out an iterative three-stage workflow (identification, estimation, diagnostic checking), introduced the seasonal extension SARIMA, and supplied worked examples on series like airline passengers, IBM stock prices, and gas furnace data that became standard datasets for decades. The book has gone through five editions: the original Holden-Day edition in 1970, a revised printing in 1976, a third edition with Gregory Reinsel in 1994 (Prentice Hall), a fourth edition in 2008, and the fifth edition (Wiley, 2015) co-authored with Reinsel and Greta Ljung. Box and Jenkins were not the only people working on this in the late 1960s, but their book made the methodology teachable and reproducible, and the term Box-Jenkins models became more or less synonymous with ARIMA in industry.
The 1980s and 1990s extended ARIMA in several directions: vector versions for multivariate series (Tiao and Box, 1981), fractional differencing for long-memory processes (Granger and Joyeux 1980; Hosking 1981), volatility extensions through GARCH (Engle 1982; Bollerslev 1986), and nonlinear cousins like threshold AR (Tong 1978) and smooth transition AR. State-space representations, championed by Harvey (1989) and others, showed that almost every ARIMA model could be cast as a Kalman filter problem, which is how most modern software actually fits them.
The Hyndman-Khandakar algorithm published in 2008 (the engine behind R's auto.arima in the forecast package by Rob J. Hyndman) automated the order-selection step that had previously required a human looking at autocorrelation plots. That is the version of ARIMA most people actually use today.
ARIMA builds up from three simpler models. Throughout, y_t denotes the observed value at time t, c is a constant, and ε_t is white noise (uncorrelated with mean zero and constant variance).
An autoregressive process of order p regresses the current value on its own previous p values:
y_t = c + φ_1 y_{t-1} + φ_2 y_{t-2} + ... + φ_p y_{t-p} + ε_t
The coefficients φ_1, ..., φ_p are estimated from data. AR(1) is a simple persistence model where today depends on yesterday; AR(2) can produce damped oscillations; higher orders can fit richer dynamics. For the process to be stationary, the roots of the characteristic polynomial 1 - φ_1 z - ... - φ_p z^p must lie outside the unit circle.
A moving-average process of order q regresses the current value on its own previous q forecast errors:
y_t = c + ε_t + θ_1 ε_{t-1} + θ_2 ε_{t-2} + ... + θ_q ε_{t-q}
This is not the same thing as a moving-average filter on the data; it is a regression on past shocks. The coefficients θ_1, ..., θ_q are estimated, and the process is invertible (the shocks can be recovered from the observations) when the roots of 1 + θ_1 z + ... + θ_q z^q lie outside the unit circle. Note that the sign convention varies across textbooks and software; some authors write the MA terms with minus signs.
Combining the two:
y_t = c + φ_1 y_{t-1} + ... + φ_p y_{t-p} + ε_t + θ_1 ε_{t-1} + ... + θ_q ε_{t-q}
ARMA models are appropriate for stationary series. By Wold's decomposition theorem, any covariance-stationary process can be approximated arbitrarily well by an ARMA model with sufficiently large p and q, which is the theoretical justification for the whole framework.
If the series is not stationary, you difference it d times until it is, then fit an ARMA model to the differenced series. Differencing once means working with Δy_t = y_t - y_{t-1}. Differencing twice means working with Δ²y_t = Δy_t - Δy_{t-1}. Using the lag operator L (defined by L y_t = y_{t-1}), the ARIMA(p, d, q) model can be written compactly as:
(1 - φ_1 L - ... - φ_p L^p)(1 - L)^d y_t = c + (1 + θ_1 L + ... + θ_q L^q) ε_t
or in shorter notation:
φ(L) (1 - L)^d y_t = c + θ(L) ε_t
where φ(L) = 1 - φ_1 L - ... - φ_p L^p and θ(L) = 1 + θ_1 L + ... + θ_q L^q. Most series in practice need d = 0 or d = 1; d = 2 shows up for things like cumulative log-prices but is rare. Higher orders of d over-difference the series and inflate variance.
Special cases and extensions of ARIMA include:
| Notation | Meaning |
|---|---|
| AR(p) | Pure autoregressive on stationary series |
| MA(q) | Pure moving average |
| ARMA(p, q) | Mixed AR and MA on stationary series |
| ARIMA(0, 0, 0) | White noise |
| ARIMA(0, 1, 0) | Random walk |
| ARIMA(0, 1, 0) with drift | Random walk with drift |
| ARIMA(p, 0, 0) | Pure AR(p) on a stationary series |
| ARIMA(0, 0, q) | Pure MA(q) |
| ARIMA(0, 1, 1) | Equivalent to simple exponential smoothing |
| ARIMA(0, 2, 2) | Equivalent to Holt's linear method |
| SARIMA(p, d, q)(P, D, Q)_s | Seasonal ARIMA with seasonal period s |
| ARIMAX | ARIMA with exogenous regressors X |
| SARIMAX | Seasonal ARIMA with exogenous regressors |
| VAR(p) | Vector autoregression for multiple series |
| VARMA(p, q) | Vector ARMA |
| VARIMA(p, d, q) | Vector ARIMA |
| ARFIMA(p, d, q) | Fractional ARIMA, real-valued d |
| ARIMA-GARCH | ARIMA mean equation, GARCH conditional variance |
Many real-world series have seasonal patterns: daily traffic peaks in the morning rush, retail sales spike in December, electricity load follows weekly cycles. SARIMA adds a seasonal block of orders (P, D, Q) at lag s (the seasonal period, for example 12 for monthly data with annual seasonality). The full model is:
Φ(L^s) φ(L) (1 - L)^d (1 - L^s)^D y_t = c + Θ(L^s) θ(L) ε_t
Here Φ(L^s) and Θ(L^s) are the seasonal AR and MA polynomials in the seasonal lag operator. SARIMA can capture combined non-seasonal and seasonal autocorrelation but only one seasonality at a time. Hourly electricity data has at least three (daily, weekly, yearly), and that is one of the cases where ARIMA-style models start to creak.
ARIMAX adds external predictor variables (the X stands for exogenous) to a non-seasonal ARIMA model. SARIMAX is the seasonal version. These let the forecaster include things like price, advertising spend, holiday flags, or weather covariates. In the standard SARIMAX formulation, the response is regressed on the exogenous variables and ARIMA is fit to the residuals, giving:
y_t = β'x_t + n_t, where n_t follows ARIMA(p, d, q)
In statsmodels and many other packages, SARIMAX is the most general estimator and the plain ARIMA classes are special cases.
Box and Jenkins prescribed a three-step iterative procedure. The steps are usually run by hand on small datasets and automated on large ones.
The goal of identification is to pick (p, d, q) (and the seasonal orders if relevant). Standard tools:
d. The augmented Dickey-Fuller (ADF) test has a null of unit root; the KPSS test (Kwiatkowski-Phillips-Schmidt-Shin) has a null of stationarity. Running both is standard because they answer the question from opposite sides. The Phillips-Perron test is a robust variant of ADF.p. A pure MA(q) shows the opposite pattern. Mixed ARMA models show decays in both. In practice these clean patterns rarely appear and the ACF/PACF are used as suggestions rather than proof.For seasonal data, the ACF and PACF are also examined at the seasonal lags to set (P, D, Q).
Given (p, d, q), the coefficients are estimated from the differenced series. The two main objective functions are conditional sum of squares (CSS) and full maximum likelihood estimation (MLE). Most modern software writes the model in state-space form and runs the Kalman filter to evaluate the likelihood, which handles missing data and unobserved components naturally. Standard errors come from the inverse Hessian or from outer-product-of-gradient (OPG) approximations.
After fitting, the residuals should look like white noise. Standard checks:
If the residuals fail any of these, you go back to identification, change the orders, refit, and recheck. The whole loop is iterative, which is why Box and Jenkins drew it as a flowchart with arrows pointing back to step 1.
ARIMA assumes weak (covariance) stationarity of the differenced series: constant mean, constant variance, and an autocovariance that depends only on the lag. Most economic and financial series are visibly nonstationary, with trends, drifts, or shifting variances. Two common transformations:
y_t - y_{t-s}) handles seasonal trends.Over-differencing is a real risk and inflates the variance of the differenced series, often introducing a spurious negative autocorrelation at lag 1. A common heuristic is to choose the smallest d that makes the unit-root test fail to reject stationarity. Hyndman and Khandakar's auto.arima implementation uses successive KPSS tests for the non-seasonal d and successive Canova-Hansen or OCSB tests for the seasonal D.
The ARIMA framework cannot handle structural breaks or regime changes natively. If the data-generating process changes (a new tax regime, a pandemic, a policy shift), refitting on a window after the break is the usual workaround.
| Test | Null hypothesis | Notes |
|---|---|---|
| Augmented Dickey-Fuller (ADF) | Series has a unit root (nonstationary) | Lag selection by AIC or BIC; multiple variants for trend and intercept |
| KPSS | Series is trend or level stationary | Run alongside ADF; agreement gives more confidence |
| Phillips-Perron (PP) | Unit root | Nonparametric correction for serial correlation and heteroskedasticity |
| Canova-Hansen | Stationarity at the seasonal lag | Used for seasonal differencing decisions |
| OCSB (Osborn-Chui-Smith-Birchenhall) | No seasonal unit root | Default in fable's ARIMA |
With manual identification you pick orders by reading ACF/PACF plots. With automated selection you pick orders by minimizing an information criterion on a grid of candidates. The standard criteria are:
| Criterion | Formula | Behavior |
|---|---|---|
| AIC (Akaike) | -2 log L + 2k | Picks slightly larger models, asymptotically efficient for prediction |
| AICc (corrected AIC) | AIC plus a small-sample bias correction | Recommended when n/k < 40 |
| BIC (Schwarz) | -2 log L + k log n | Penalizes complexity more heavily, consistent for true model order |
| HQIC (Hannan-Quinn) | -2 log L + 2k log log n | A compromise |
Here L is the maximized likelihood, k the number of parameters, and n the sample size. None of the criteria is universally best; they encode different priors about complexity. Cross-validation on rolling forecast windows is more expensive but tends to give better out-of-sample performance.
Rob J. Hyndman and Yeasmin Khandakar published a step-wise search algorithm in 2008 (Journal of Statistical Software) that combines unit-root testing for d and D with a fast greedy search over (p, q, P, Q) minimizing AICc. It starts from a small set of seed models, perturbs the orders one at a time, and keeps the best until no neighbor improves. The implementation in the R forecast package became the de-facto reference for automated ARIMA fitting and was ported to Python as pmdarima.
auto.arima is not a replacement for thinking. It can pick weird models on noisy data, it does not know about structural breaks, and its default settings impose a maximum on the orders that the user can change. But for routine forecasting on hundreds of series, it is hard to beat in terms of effort-per-forecast.
Granger and Joyeux (1980) and Hosking (1981) generalized differencing to fractional orders, allowing d to take any real value (typically 0 < d < 0.5 for stationarity). ARFIMA models long-memory processes whose autocorrelation decays as a power law rather than exponentially. They show up in hydrology, network traffic, and some financial volatility series.
Vector autoregression (VAR), popularized by Christopher Sims's 1980 paper Macroeconomics and Reality, generalizes AR(p) to multiple series at once: each variable is regressed on lags of itself and all the others. VARIMA adds differencing, but in practice the multivariate world has been dominated by VAR rather than VARIMA, partly because identifying multivariate MA components is harder and partly because VAR has a clean interpretation in terms of impulse-response functions.
Cointegration, the Nobel-winning Engle-Granger and Johansen frameworks, sits on top of VAR. When several nonstationary series share a common stochastic trend, you can model their differences and the long-run equilibrium together in a vector error-correction model (VECM).
ARIMA models the conditional mean and assumes constant variance. Robert Engle's 1982 ARCH model (Nobel Prize 2003) and Tim Bollerslev's 1986 generalization to GARCH model the conditional variance, which is essential for financial returns. A common pipeline is to fit ARIMA to the mean equation and GARCH to the residuals, giving an ARIMA-GARCH composite. See GARCH.
Howard Tong's 1978 threshold autoregressive (TAR) model lets the AR coefficients switch between regimes depending on whether a state variable crosses a threshold. Smooth transition AR (STAR) replaces the hard switch with a logistic-like function. These nonlinear cousins of ARIMA capture asymmetries that linear models cannot, but they need more data and are harder to identify.
Almost every ARIMA model has an equivalent state-space representation. Andrew Harvey's Forecasting, Structural Time Series Models and the Kalman Filter (1989) reformulated the field around the state-space view, which makes irregular time stamps, missing data, time-varying coefficients, and exogenous regressors much easier to handle. The Bayesian counterpart is BSTS (Bayesian Structural Time Series), used at Google for causal impact analysis.
| Package | Language | Notes |
|---|---|---|
forecast (auto.arima, Arima) | R | Hyndman-Khandakar reference implementation |
fable | R | Successor to forecast in the tidyverts |
statsmodels.tsa.arima.model.ARIMA | Python | Standard ARIMA class in statsmodels |
statsmodels.tsa.statespace.SARIMAX | Python | More general state-space estimator |
pmdarima.auto_arima | Python | Port of auto.arima |
darts | Python | Wraps statsmodels and exposes ARIMA alongside neural forecasters |
nixtla statsforecast | Python | Fast vectorized AutoARIMA for parallel fitting |
prophet (Stan backend) | Python / R | Not strictly ARIMA but the most common alternative for the same problems |
TimeSeries.jl, StateSpaceModels.jl | Julia | Julia ecosystem ARIMA tooling |
| PROC ARIMA | SAS | Used heavily in pharma and finance |
| EViews ARIMA / SARIMAX | EViews | Standard in academic econometrics |
arima command | Stata | Estimation by ML or CSS |
forecast::Arima, prophet, fpp3 book code | R / book | Hyndman and Athanasopoulos's open-access Forecasting: Principles and Practice uses fable throughout |
Most of these compute the likelihood with a Kalman filter on the state-space form, which means they handle missing observations and irregular data naturally. The coefficient estimates and standard errors should agree across packages to several decimal places when the optimizer converges, but the default starting values, convergence tolerances, and order-selection heuristics differ.
The last decade brought a wave of neural and foundation-model forecasters. They sit at different points on the bias-variance and effort-to-deploy trade-offs.
| Method | Year | Type | Multivariate | Probabilistic | Notes |
|---|---|---|---|---|---|
| ARIMA | 1970 | Linear, parametric | No (use VAR or SARIMAX) | Gaussian residuals | Fast, interpretable, baseline |
| Exponential smoothing (ETS) | 1957-1985 | Linear, parametric | No | Gaussian residuals | Often tied with ARIMA in M-competitions |
| Theta method | 2000 | Decomposition | No | Gaussian residuals | Won M3; surprisingly hard to beat |
| Prophet | 2017 | Additive components | No | Bayesian intervals | Built at Facebook for business series with multiple seasonalities |
| DeepAR | 2017 | Autoregressive RNN | Cross-series | Quantile or Gaussian | Built at Amazon, trains across many related series; uses LSTM |
| N-BEATS | 2019 | Residual MLP stack | No | Point forecasts | Yoshua Bengio's group at Element AI; outperformed M4 winner |
| Temporal Fusion Transformer | 2019 | Attention model | Yes | Quantile | Google Brain, handles known and unknown future inputs |
| N-HiTS | 2022 | Hierarchical interpolation | No | Point forecasts | Faster N-BEATS variant |
| TimesNet | 2023 | 2D inception | Yes | Point forecasts | Tsinghua group |
| Lag-Llama | 2023 | Decoder-only transformer | No | Probabilistic | Open foundation model trained on diverse series |
| TimeGPT | 2023 | Encoder-decoder transformer | Yes | Quantile | Nixtla; first commercial time-series foundation model API |
| Chronos | 2024 | T5 over tokenized values | No | Probabilistic | Built at Amazon, zero-shot |
| Moirai | 2024 | Masked encoder | Yes | Probabilistic | Salesforce, any-frequency any-variate |
| TimesFM | 2024 | Decoder transformer | No | Point forecasts | Built at Google, zero-shot |
| Time-LLM | 2024 | Reprogrammed LLM | No | Point forecasts | Wraps a frozen LLM with a series adapter |
A few honest comparisons:
The M-competitions, organized by Spyros Makridakis since 1982, are large empirical forecasting bake-offs. They are the closest thing the field has to a leaderboard.
| Competition | Year | Series | Winner | ARIMA finish |
|---|---|---|---|---|
| M (M1) | 1982 | 1,001 | Simple methods | Mid-pack; no clear ARIMA advantage |
| M2 | 1993 | 29 | Mixed | Mixed |
| M3 | 2000 | 3,003 | Theta method (later beaten by ForecastPro) | ARIMA competitive on yearly series, weaker on monthly |
| M4 | 2018 | 100,000 | Slawek Smyl's hybrid ES-RNN (Uber) | Pure ARIMA below ensemble baselines, still beat many ML methods |
| M5 | 2020 | 42,840 (Walmart) | LightGBM-based gradient boosting ensembles | ARIMA in the bottom half |
| M6 | 2022 | 50 (financial) | Forecasting plus investment | Not the focus |
The M3 paper (Makridakis and Hibon, 2000) became famous for the conclusion that "statistically sophisticated or complex methods do not necessarily produce more accurate forecasts than simpler ones," which was a hard sell for machine-learning advocates at the time. The M4 paper (2018) flipped this: the top six entries were all hybrids of statistical and machine-learning methods, and the pure-statistical baselines came in below them. The M5 result was even more decisive for machine learning, with gradient-boosting ensembles dominating. Even so, the M5 organizers noted that the simple combination benchmark (a mix of statistical methods including ARIMA and ETS) beat the majority of the 5,558 submitted entries.
Despite the press around foundation models, ARIMA holds up in several settings:
ARIMA's weaknesses are essentially the inverse of its strengths:
(p, d, q) choices can give noticeably different forecasts.ARIMA still has a long tail of working deployments:
It is also still the most-taught time-series method in MBA programs, statistics departments, and engineering curricula, which means most working analysts have an ARIMA model in their toolkit even if they reach for something else first.