See also: Machine learning terms
A quantile is a value that splits a probability distribution or a dataset into intervals containing equal portions of the probability mass or observations. For a number q with 0 < q < 1, the q-quantile is the value below which a fraction q of the data falls. Quantiles are foundational to descriptive statistics, robust regression, probabilistic forecasting, and modern uncertainty quantification methods such as conformal prediction. They appear throughout machine learning whenever a practitioner needs more than a point estimate, for example when they want a prediction interval, a robust summary, or a metric that resists outliers.
Let X be a real-valued random variable with cumulative distribution function F(x) = P(X ≤ x). The q-quantile of X, written Q(q), is defined by the generalized inverse of F:
Q(q) = inf{x ∈ ℝ : F(x) ≥ q}, 0 < q < 1.
When F is strictly increasing and continuous, Q(q) = F⁻¹(q) and the definition reduces to a plain inverse. The function Q is called the quantile function or inverse distribution function. Two basic properties follow directly from the definition: Q is non-decreasing in q, and F(Q(q)) ≥ q with equality when F is continuous.
For a finite dataset of n observations, the empirical CDF F̂ₙ(x) counts the proportion of points at or below x:
F̂ₙ(x) = (1/n) Σ 𝟙(Xᵢ ≤ x).
The sample q-quantile is then any value satisfying Q̂(q) = inf{x : F̂ₙ(x) ≥ q}. Because F̂ₙ is a step function, this definition is not unique between order statistics, which is why software packages disagree on sample quantile estimates (see Sample quantile estimation below).
Several named quantiles appear so often that they have their own labels.
| Quantile | Value of q | Common name | Notes |
|---|---|---|---|
| Median | 0.5 | Second quartile, Q2 | The 50th percentile; robust measure of central tendency |
| First quartile | 0.25 | Q1, lower quartile | Bottom of the middle 50% |
| Third quartile | 0.75 | Q3, upper quartile | Top of the middle 50% |
| Tercile | 1/3, 2/3 | Lower and upper tercile | Splits data into thirds |
| Quartile | 0.25, 0.5, 0.75 | Q1, Q2, Q3 | Splits data into quarters |
| Quintile | 0.2, 0.4, 0.6, 0.8 | Five equal groups | Often used in income statistics |
| Decile | 0.1, 0.2, ..., 0.9 | Ten equal groups | Common in education and finance |
| Percentile | 0.01, 0.02, ..., 0.99 | Hundred equal groups | Standard in test scoring and growth charts |
| Permille | 0.001, ..., 0.999 | Thousand equal groups | Used in extreme-event analysis |
A derived quantity is the interquartile range (IQR), defined as Q3 − Q1. The IQR is a scale measure that ignores the outer 25% of the distribution on each side, which makes it robust to outliers and a standard ingredient of box plots. The Tukey rule flags points outside [Q1 − 1.5 IQR, Q3 + 1.5 IQR] as potential outliers and points outside [Q1 − 3 IQR, Q3 + 3 IQR] as far outliers.
Given n sorted observations x₍₁₎ ≤ x₍₂₎ ≤ ... ≤ x₍ₙ₎, there is no single right way to estimate Q(q) when nq is not an integer. Hyndman and Fan (1996) catalogued nine distinct definitions used by statistical packages of the time, three based on rounding and six based on linear interpolation between consecutive order statistics. They expressed each definition in a unified form Q̂(p) = (1 − γ) x₍ⱼ₎ + γ x₍ⱼ₊₁₎ and recommended their type 8 estimator, which is approximately median-unbiased regardless of the underlying distribution.
NumPy's numpy.quantile and the equivalent numpy.percentile follow the same taxonomy. The default since NumPy 1.22 is linear, which corresponds to Hyndman and Fan's type 7. The full list of methods supported through the method argument:
| NumPy method | Behaviour | Hyndman-Fan type |
|---|---|---|
linear (default) | Linear interpolation between order statistics, weight = fraction | Type 7 |
lower | Returns x₍ᵢ₎ where i = floor((n−1)q) + 1 | Type 1 (rounding down) |
higher | Returns x₍ⱼ₎ where j = ceil((n−1)q) + 1 | Discontinuous |
nearest | Returns whichever of x₍ᵢ₎ or x₍ⱼ₎ is closer | Type 3 (round to even) |
midpoint | Returns (x₍ᵢ₎ + x₍ⱼ₎) / 2 | Discontinuous |
averaged_inverted_cdf | Average of the inverse CDF and one-sided variants | Type 2 |
closest_observation | Variant of nearest that breaks ties downward | Type 3 |
interpolated_inverted_cdf | Linear interpolation of the inverse empirical CDF | Type 4 |
hazen | Continuous interpolation centred at (i − 0.5)/n | Type 5 |
weibull | Plotting position estimate, weight i/(n+1) | Type 6 |
median_unbiased | Approximately median-unbiased for any distribution | Type 8 |
normal_unbiased | Approximately unbiased for normal data | Type 9 |
R's quantile(), SAS, Stata, Excel, and SciPy all expose a similar choice of estimator. R defaults to type 7, the same as NumPy. Different defaults can produce visibly different answers, especially for small samples or extreme quantiles, which is why reproducible research notes the convention used.
Consider the dataset {2, 4, 4, 5, 7, 9, 10, 12, 13, 15} sorted with n = 10. To compute the 0.25 and 0.75 quantiles using the type 7 (NumPy default) rule, the index is h = (n − 1) q + 1.
For q = 0.25, h = 9 · 0.25 + 1 = 3.25, so Q(0.25) = x₍₃₎ + 0.25 · (x₍₄₎ − x₍₃₎) = 4 + 0.25 · (5 − 4) = 4.25.
For q = 0.75, h = 9 · 0.75 + 1 = 7.75, so Q(0.75) = x₍₇₎ + 0.75 · (x₍₈₎ − x₍₇₎) = 10 + 0.75 · (12 − 10) = 11.5.
The IQR is then 11.5 − 4.25 = 7.25, and the median is the average of x₍₅₎ and x₍₆₎, which gives 8. A different convention, say type 4, would round these values to nearby order statistics and produce slightly different results.
A Q-Q plot is a graphical tool that compares two distributions by plotting their quantiles against each other. The simplest case plots empirical sample quantiles against theoretical quantiles of a reference distribution, often the standard normal. Each point has the form (theoretical quantile at p, sample quantile at p) for a grid of p values. If the two distributions match up to a linear transformation, the points fall on a straight line. Curvature in the tails reveals skewness, S-shaped patterns indicate heavier or lighter tails than the reference, and isolated outlying points highlight individual extreme observations.
Q-Q plots are widely used in regression diagnostics to check the normality of residuals, in finance to compare asset return distributions, and in genomics to detect inflation in p-value distributions from genome-wide association studies. They are usually preferred over histograms or kernel density plots when the question is about distributional shape rather than density.
Box plots summarise a distribution using five quantile-based numbers: the minimum, Q1, the median, Q3, and the maximum. The central box spans Q1 to Q3 and the line inside it marks the median. Whiskers extend from the box to the most extreme data point within 1.5 IQR of the nearest quartile, and points beyond the whiskers are drawn individually as suspected outliers. Box plots, introduced by John Tukey in 1977, are a compact way to compare distributions across groups and have become standard in exploratory data analysis.
Linear regression estimates the conditional mean E[Y | X = x]. Quantile regression, introduced by Roger Koenker and Gilbert Bassett Jr. in their 1978 Econometrica paper Regression Quantiles, instead estimates the conditional q-quantile Qᵨ(q | X = x). Where ordinary least squares minimises squared residuals, quantile regression minimises the asymmetric pinball loss, also called check loss:
Lᵨ(y, ŷ) = max(q (y − ŷ), (q − 1) (y − ŷ)).
Equivalently, the loss penalises positive residuals with weight q and negative residuals with weight 1 − q. At q = 0.5 it reduces to half the absolute error, so median regression is also called least absolute deviations (LAD) regression. The optimisation problem is a linear program and can be solved with simplex or interior-point methods.
Quantile regression has several practical advantages over mean regression. It is robust to outliers in Y because it uses absolute rather than squared losses. It naturally captures heteroscedasticity, since fitting the 0.1 and 0.9 quantiles separately produces a prediction interval that widens where the data are noisier. It does not require normality of errors and gives a fuller picture of the conditional distribution, not just its centre. The downside is that fitting many quantiles separately can produce non-monotone curves, a problem addressed by techniques such as quantile crossing penalties or simultaneous quantile estimation.
Major implementations include quantreg in R (maintained by Koenker), statsmodels.regression.quantile_regression.QuantReg and scikit-learn's QuantileRegressor in Python, and quantile objectives in gradient boosting libraries LightGBM, XGBoost, and CatBoost. The gradient boosting variants fit a separate model per requested quantile and are widely used in retail forecasting and energy demand prediction.
Many modern time-series neural networks output multiple quantiles simultaneously rather than a point forecast. Amazon's DeepAR model, published by Salinas et al. in 2017, parameterises a probability distribution at each forecast step and supports both Gaussian and quantile-output variants. Google's Temporal Fusion Transformer (TFT, Lim et al. 2019) trains a Transformer-based architecture by minimising the pinball loss summed across a chosen set of quantiles, often {0.1, 0.5, 0.9}. The result is a calibrated set of prediction quantiles that can be turned into prediction intervals, point forecasts (the median), or shortfall estimates without retraining.
The M5 forecasting competition on Walmart sales had a dedicated uncertainty track judged on the weighted scaled pinball loss across nine quantiles. Top entries combined gradient boosting models with custom quantile reconciliation across the product hierarchy, and the competition cemented quantile prediction as the default interface for probabilistic forecasting in industry.
Conformal prediction, developed by Vladimir Vovk, Alex Gammerman, and Glenn Shafer in the early 2000s, builds prediction sets that are guaranteed to contain the true response with a user-chosen probability, regardless of the underlying data distribution, as long as the data are exchangeable. The core construction takes a base predictor, defines a nonconformity score on a held-out calibration set, and uses a quantile of those scores to set the prediction set's threshold. Concretely, for a target miscoverage rate α, the threshold is the ⌈(n+1)(1−α)⌉ / n quantile of the calibration scores.
For regression, the simplest variant uses absolute residuals as nonconformity scores and produces a constant-width interval around any point prediction. Romano, Patterson, and Candès introduced conformalized quantile regression (CQR) at NeurIPS 2019, which uses an underlying quantile regression model to predict an initial interval [ŷ₍α∕₂₎, ŷ₍₁₋α∕₂₎] and then conformalises it using a quantile of signed gap scores on the calibration set. The result is an adaptive interval that widens where the data are noisier yet still enjoys finite-sample marginal coverage.
The definitive practitioner reference is the tutorial A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification by Anastasios Angelopoulos and Stephen Bates, published as a Foundations and Trends monograph in 2023. Their open-source repository implements conformal prediction on real datasets and has helped drive adoption in industry.
In quantitative finance, quantiles of the loss distribution underpin two of the most widely used risk metrics. Value at Risk (VaR) at level α is the (1 − α)-quantile of the portfolio loss distribution over a fixed horizon, often one or ten trading days. A 99% one-day VaR of $10 million means losses are expected to exceed $10 million on no more than 1% of days. Regulators including the Basel Committee on Banking Supervision require banks to compute value at risk for market-risk capital charges.
Expected Shortfall (ES), also called Conditional VaR (CVaR), Average VaR, or Expected Tail Loss, is the conditional mean of the loss given that the loss exceeds VaR. ES at level α averages the worst 100α% of outcomes and is therefore sensitive to the tail shape, while VaR is not. The Basel III framework adopted ES at the 97.5% level as the regulatory replacement for 99% VaR for trading-book capital, partly because ES is coherent (subadditive) while VaR is not.
In genomics, quantile normalization (Bolstad et al., 2003, Bioinformatics) makes the distribution of values in many samples identical by replacing each value with the average of all values that share its rank. The procedure is: sort each column independently, take the row means of the sorted matrix, and then write those means back into the original positions. The result is that every column has the same empirical distribution while the rank order within each column is preserved. Quantile normalization became a standard preprocessing step for Affymetrix microarrays through the RMA pipeline and has since been extended to RNA-seq and methylation arrays.
Quantiles also show up in several other corners of machine learning and statistics. Quantile binning, often called bucketing, replaces a continuous feature with categorical bins of equal frequency, which can stabilise tree models and is a standard preprocessing recipe in tools such as scikit-learn's KBinsDiscretizer. Robust scaling subtracts the median and divides by the IQR rather than using mean and standard deviation, which gives a feature scale that is insensitive to outliers. Winsorisation clips extreme values at chosen quantiles, for example replacing everything below the 1st percentile with the 1st-percentile value. Quantile transforms map any feature distribution to a uniform or Gaussian distribution by passing it through its empirical CDF, which is useful before models that assume Gaussian inputs.
In reinforcement learning, distributional methods such as QR-DQN (Dabney et al., 2018) represent the value function as a discrete set of quantile estimates rather than a single expectation, which improves stability and lets the agent reason about return distributions.
Imagine you have a bag of differently colored candies. If you want to divide the candies into equal parts, you can use the concept of quantiles. For example, if you want to divide them into 4 equal parts, you would use quartiles (like cutting a cake into 4 pieces). If you want to divide them into 100 equal parts, you would use percentiles.
In machine learning, quantiles help us understand and organize data better. We use them to clean up data, create new information, and make decisions. They also help us understand how different factors might affect an outcome.