Quantile

See also: Machine learning terms

A quantile is a value that splits a probability distribution or a dataset into intervals containing equal portions of the probability mass or observations. For a number q with 0 < q < 1, the q-quantile is the value below which a fraction q of the data falls. Quantiles are foundational to descriptive statistics, robust regression, probabilistic forecasting, and modern uncertainty quantification methods such as conformal prediction. They appear throughout machine learning whenever a practitioner needs more than a point estimate, for example when they want a prediction interval, a robust summary, or a metric that resists outliers.

Formal definition

Let X be a real-valued random variable with cumulative distribution function F(x) = P(X ≤ x). The q-quantile of X, written Q(q), is defined by the generalized inverse of F:

Q(q) = inf{x ∈ ℝ : F(x) ≥ q}, 0 < q < 1.

When F is strictly increasing and continuous, Q(q) = F⁻¹(q) and the definition reduces to a plain inverse. The function Q is called the quantile function or inverse distribution function. Two basic properties follow directly from the definition: Q is non-decreasing in q, and F(Q(q)) ≥ q with equality when F is continuous.

For a finite dataset of n observations, the empirical CDF F̂ₙ(x) counts the proportion of points at or below x:

F̂ₙ(x) = (1/n) Σ 𝟙(Xᵢ ≤ x).

The sample q-quantile is then any value satisfying Q̂(q) = inf{x : F̂ₙ(x) ≥ q}. Because F̂ₙ is a step function, this definition is not unique between order statistics, which is why software packages disagree on sample quantile estimates (see Sample quantile estimation below).

Special quantiles

Several named quantiles appear so often that they have their own labels.

Quantile	Value of q	Common name	Notes
Median	0.5	Second quartile, Q2	The 50th percentile; robust measure of central tendency
First quartile	0.25	Q1, lower quartile	Bottom of the middle 50%
Third quartile	0.75	Q3, upper quartile	Top of the middle 50%
Tercile	1/3, 2/3	Lower and upper tercile	Splits data into thirds
Quartile	0.25, 0.5, 0.75	Q1, Q2, Q3	Splits data into quarters
Quintile	0.2, 0.4, 0.6, 0.8	Five equal groups	Often used in income statistics
Decile	0.1, 0.2, ..., 0.9	Ten equal groups	Common in education and finance
Percentile	0.01, 0.02, ..., 0.99	Hundred equal groups	Standard in test scoring and growth charts
Permille	0.001, ..., 0.999	Thousand equal groups	Used in extreme-event analysis

A derived quantity is the interquartile range (IQR), defined as Q3 − Q1. The IQR is a scale measure that ignores the outer 25% of the distribution on each side, which makes it robust to outliers and a standard ingredient of box plots. The Tukey rule flags points outside [Q1 − 1.5 IQR, Q3 + 1.5 IQR] as potential outliers and points outside [Q1 − 3 IQR, Q3 + 3 IQR] as far outliers.

Sample quantile estimation

Given n sorted observations x₍₁₎ ≤ x₍₂₎ ≤ ... ≤ x₍ₙ₎, there is no single right way to estimate Q(q) when nq is not an integer. Hyndman and Fan (1996) catalogued nine distinct definitions used by statistical packages of the time, three based on rounding and six based on linear interpolation between consecutive order statistics. They expressed each definition in a unified form Q̂(p) = (1 − γ) x₍ⱼ₎ + γ x₍ⱼ₊₁₎ and recommended their type 8 estimator, which is approximately median-unbiased regardless of the underlying distribution.

NumPy's numpy.quantile and the equivalent numpy.percentile follow the same taxonomy. The default since NumPy 1.22 is linear, which corresponds to Hyndman and Fan's type 7. The full list of methods supported through the method argument:

NumPy method	Behaviour	Hyndman-Fan type
`linear` (default)	Linear interpolation between order statistics, weight = fraction	Type 7
`lower`	Returns x₍ᵢ₎ where i = floor((n−1)q) + 1	Type 1 (rounding down)
`higher`	Returns x₍ⱼ₎ where j = ceil((n−1)q) + 1	Discontinuous
`nearest`	Returns whichever of x₍ᵢ₎ or x₍ⱼ₎ is closer	Type 3 (round to even)
`midpoint`	Returns (x₍ᵢ₎ + x₍ⱼ₎) / 2	Discontinuous
`averaged_inverted_cdf`	Average of the inverse CDF and one-sided variants	Type 2
`closest_observation`	Variant of nearest that breaks ties downward	Type 3
`interpolated_inverted_cdf`	Linear interpolation of the inverse empirical CDF	Type 4
`hazen`	Continuous interpolation centred at (i − 0.5)/n	Type 5
`weibull`	Plotting position estimate, weight i/(n+1)	Type 6
`median_unbiased`	Approximately median-unbiased for any distribution	Type 8
`normal_unbiased`	Approximately unbiased for normal data	Type 9

R's quantile(), SAS, Stata, Excel, and SciPy all expose a similar choice of estimator. R defaults to type 7, the same as NumPy. Different defaults can produce visibly different answers, especially for small samples or extreme quantiles, which is why reproducible research notes the convention used.

Worked example

Consider the dataset {2, 4, 4, 5, 7, 9, 10, 12, 13, 15} sorted with n = 10. To compute the 0.25 and 0.75 quantiles using the type 7 (NumPy default) rule, the index is h = (n − 1) q + 1.

For q = 0.25, h = 9 · 0.25 + 1 = 3.25, so Q(0.25) = x₍₃₎ + 0.25 · (x₍₄₎ − x₍₃₎) = 4 + 0.25 · (5 − 4) = 4.25.

For q = 0.75, h = 9 · 0.75 + 1 = 7.75, so Q(0.75) = x₍₇₎ + 0.75 · (x₍₈₎ − x₍₇₎) = 10 + 0.75 · (12 − 10) = 11.5.

The IQR is then 11.5 − 4.25 = 7.25, and the median is the average of x₍₅₎ and x₍₆₎, which gives 8. A different convention, say type 4, would round these values to nearby order statistics and produce slightly different results.

Quantile-quantile plots

A Q-Q plot is a graphical tool that compares two distributions by plotting their quantiles against each other. The simplest case plots empirical sample quantiles against theoretical quantiles of a reference distribution, often the standard normal. Each point has the form (theoretical quantile at p, sample quantile at p) for a grid of p values. If the two distributions match up to a linear transformation, the points fall on a straight line. Curvature in the tails reveals skewness, S-shaped patterns indicate heavier or lighter tails than the reference, and isolated outlying points highlight individual extreme observations.

Q-Q plots are widely used in regression diagnostics to check the normality of residuals, in finance to compare asset return distributions, and in genomics to detect inflation in p-value distributions from genome-wide association studies. They are usually preferred over histograms or kernel density plots when the question is about distributional shape rather than density.

Box plots

Box plots summarise a distribution using five quantile-based numbers: the minimum, Q1, the median, Q3, and the maximum. The central box spans Q1 to Q3 and the line inside it marks the median. Whiskers extend from the box to the most extreme data point within 1.5 IQR of the nearest quartile, and points beyond the whiskers are drawn individually as suspected outliers. Box plots, introduced by John Tukey in 1977, are a compact way to compare distributions across groups and have become standard in exploratory data analysis.

Quantile regression

Linear regression estimates the conditional mean E[Y | X = x]. Quantile regression, introduced by Roger Koenker and Gilbert Bassett Jr. in their 1978 Econometrica paper Regression Quantiles, instead estimates the conditional q-quantile Qᵨ(q | X = x). Where ordinary least squares minimises squared residuals, quantile regression minimises the asymmetric pinball loss, also called check loss:

Lᵨ(y, ŷ) = max(q (y − ŷ), (q − 1) (y − ŷ)).

Equivalently, the loss penalises positive residuals with weight q and negative residuals with weight 1 − q. At q = 0.5 it reduces to half the absolute error, so median regression is also called least absolute deviations (LAD) regression. The optimisation problem is a linear program and can be solved with simplex or interior-point methods.

Quantile regression has several practical advantages over mean regression. It is robust to outliers in Y because it uses absolute rather than squared losses. It naturally captures heteroscedasticity, since fitting the 0.1 and 0.9 quantiles separately produces a prediction interval that widens where the data are noisier. It does not require normality of errors and gives a fuller picture of the conditional distribution, not just its centre. The downside is that fitting many quantiles separately can produce non-monotone curves, a problem addressed by techniques such as quantile crossing penalties or simultaneous quantile estimation.

Major implementations include quantreg in R (maintained by Koenker), statsmodels.regression.quantile_regression.QuantReg and scikit-learn's QuantileRegressor in Python, and quantile objectives in gradient boosting libraries LightGBM, XGBoost, and CatBoost. The gradient boosting variants fit a separate model per requested quantile and are widely used in retail forecasting and energy demand prediction.

Probabilistic forecasting and deep quantile losses

Many modern time-series neural networks output multiple quantiles simultaneously rather than a point forecast. Amazon's DeepAR model, published by Salinas et al. in 2017, parameterises a probability distribution at each forecast step and supports both Gaussian and quantile-output variants. Google's Temporal Fusion Transformer (TFT, Lim et al. 2019) trains a Transformer-based architecture by minimising the pinball loss summed across a chosen set of quantiles, often {0.1, 0.5, 0.9}. The result is a calibrated set of prediction quantiles that can be turned into prediction intervals, point forecasts (the median), or shortfall estimates without retraining.

The M5 forecasting competition on Walmart sales had a dedicated uncertainty track judged on the weighted scaled pinball loss across nine quantiles. Top entries combined gradient boosting models with custom quantile reconciliation across the product hierarchy, and the competition cemented quantile prediction as the default interface for probabilistic forecasting in industry.

Conformal prediction

Conformal prediction, developed by Vladimir Vovk, Alex Gammerman, and Glenn Shafer in the early 2000s, builds prediction sets that are guaranteed to contain the true response with a user-chosen probability, regardless of the underlying data distribution, as long as the data are exchangeable. The core construction takes a base predictor, defines a nonconformity score on a held-out calibration set, and uses a quantile of those scores to set the prediction set's threshold. Concretely, for a target miscoverage rate α, the threshold is the ⌈(n+1)(1−α)⌉ / n quantile of the calibration scores.

For regression, the simplest variant uses absolute residuals as nonconformity scores and produces a constant-width interval around any point prediction. Romano, Patterson, and Candès introduced conformalized quantile regression (CQR) at NeurIPS 2019, which uses an underlying quantile regression model to predict an initial interval [ŷ₍α∕₂₎, ŷ₍₁₋α∕₂₎] and then conformalises it using a quantile of signed gap scores on the calibration set. The result is an adaptive interval that widens where the data are noisier yet still enjoys finite-sample marginal coverage.

The definitive practitioner reference is the tutorial A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification by Anastasios Angelopoulos and Stephen Bates, published as a Foundations and Trends monograph in 2023. Their open-source repository implements conformal prediction on real datasets and has helped drive adoption in industry.

Quantile-based metrics in finance

In quantitative finance, quantiles of the loss distribution underpin two of the most widely used risk metrics. Value at Risk (VaR) at level α is the (1 − α)-quantile of the portfolio loss distribution over a fixed horizon, often one or ten trading days. A 99% one-day VaR of $10 million means losses are expected to exceed $10 million on no more than 1% of days. Regulators including the Basel Committee on Banking Supervision require banks to compute value at risk for market-risk capital charges.

Expected Shortfall (ES), also called Conditional VaR (CVaR), Average VaR, or Expected Tail Loss, is the conditional mean of the loss given that the loss exceeds VaR. ES at level α averages the worst 100α% of outcomes and is therefore sensitive to the tail shape, while VaR is not. The Basel III framework adopted ES at the 97.5% level as the regulatory replacement for 99% VaR for trading-book capital, partly because ES is coherent (subadditive) while VaR is not.

Quantile normalization

In genomics, quantile normalization (Bolstad et al., 2003, Bioinformatics) makes the distribution of values in many samples identical by replacing each value with the average of all values that share its rank. The procedure is: sort each column independently, take the row means of the sorted matrix, and then write those means back into the original positions. The result is that every column has the same empirical distribution while the rank order within each column is preserved. Quantile normalization became a standard preprocessing step for Affymetrix microarrays through the RMA pipeline and has since been extended to RNA-seq and methylation arrays.

Other applications

Quantiles also show up in several other corners of machine learning and statistics. Quantile binning, often called bucketing, replaces a continuous feature with categorical bins of equal frequency, which can stabilise tree models and is a standard preprocessing recipe in tools such as scikit-learn's KBinsDiscretizer. Robust scaling subtracts the median and divides by the IQR rather than using mean and standard deviation, which gives a feature scale that is insensitive to outliers. Winsorisation clips extreme values at chosen quantiles, for example replacing everything below the 1st percentile with the 1st-percentile value. Quantile transforms map any feature distribution to a uniform or Gaussian distribution by passing it through its empirical CDF, which is useful before models that assume Gaussian inputs.

In reinforcement learning, distributional methods such as QR-DQN (Dabney et al., 2018) represent the value function as a discrete set of quantile estimates rather than a single expectation, which improves stability and lets the agent reason about return distributions.

References

Hyndman, R. J., & Fan, Y. (1996). Sample Quantiles in Statistical Packages. *The American Statistician*, 50(4), 361-365. https://www.tandfonline.com/doi/abs/10.1080/00031305.1996.10473566
Koenker, R., & Bassett, G. (1978). Regression Quantiles. *Econometrica*, 46(1), 33-50. https://www.econometricsociety.org/publications/econometrica/1978/01/01/regression-quantiles
Koenker, R. (2005). *Quantile Regression*. Cambridge University Press.
Romano, Y., Patterson, E., & Candès, E. J. (2019). Conformalized Quantile Regression. *Advances in Neural Information Processing Systems*, 32, 3538-3548. https://arxiv.org/abs/1905.03222
Angelopoulos, A. N., & Bates, S. (2023). Conformal Prediction: A Gentle Introduction. *Foundations and Trends in Machine Learning*, 16(4), 494-591. https://arxiv.org/abs/2107.07511
Vovk, V., Gammerman, A., & Shafer, G. (2005). *Algorithmic Learning in a Random World*. Springer.
NumPy Developers. numpy.quantile reference documentation. https://numpy.org/doc/stable/reference/generated/numpy.quantile.html
Lim, B., Arık, S. Ö., Loeff, N., & Pfister, T. (2019). Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. *International Journal of Forecasting*. https://arxiv.org/abs/1912.09363
Salinas, D., Flunkert, V., & Gasthaus, J. (2017). DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. *International Journal of Forecasting*. https://arxiv.org/abs/1704.04110
Bolstad, B. M., Irizarry, R. A., Astrand, M., & Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. *Bioinformatics*, 19(2), 185-193. https://academic.oup.com/bioinformatics/article/19/2/185/372664
Tukey, J. W. (1977). *Exploratory Data Analysis*. Addison-Wesley.
Basel Committee on Banking Supervision (2019). *Minimum capital requirements for market risk*. Bank for International Settlements. https://www.bis.org/bcbs/publ/d457.htm
Dabney, W., Rowland, M., Bellemare, M. G., & Munos, R. (2018). Distributional Reinforcement Learning with Quantile Regression. *AAAI Conference on Artificial Intelligence*. https://arxiv.org/abs/1710.10044
scikit-learn developers. QuantileRegressor documentation. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.QuantileRegressor.html

Explain Like I'm 5 (ELI5)

Imagine you have a bag of differently colored candies. If you want to divide the candies into equal parts, you can use the concept of quantiles. For example, if you want to divide them into 4 equal parts, you would use quartiles (like cutting a cake into 4 pieces). If you want to divide them into 100 equal parts, you would use percentiles.

In machine learning, quantiles help us understand and organize data better. We use them to clean up data, create new information, and make decisions. They also help us understand how different factors might affect an outcome.

Formal definition

Special quantiles

Sample quantile estimation

Worked example

Quantile-quantile plots

Box plots

Quantile regression

Probabilistic forecasting and deep quantile losses

Conformal prediction

Quantile-based metrics in finance

Quantile normalization

Other applications

References

Explain Like I'm 5 (ELI5)

Improve this article

Related Articles

ARC-AGI 2

AUC-ROC

ARIMA

Machine learning terms/Clustering

Machine learning terms/Decision Forests

Machine learning terms/Fairness

Formal definition

Special quantiles

Sample quantile estimation

Worked example

Quantile-quantile plots

Box plots

Quantile regression

Probabilistic forecasting and deep quantile losses

Conformal prediction

Quantile-based metrics in finance

Quantile normalization

Other applications

References

Explain Like I'm 5 (ELI5)

Related Articles

ARC-AGI 2

AUC-ROC

ARIMA

Machine learning terms/Clustering

Machine learning terms/Decision Forests

Machine learning terms/Fairness