# Quantile

> Source: https://aiwiki.ai/wiki/quantile
> Updated: 2026-06-27
> Categories: Machine Learning, Statistics
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

A **quantile** is a cut point that divides a probability distribution or a sorted dataset into intervals containing equal portions of the probability or the observations. For a number q with 0 < q < 1, the q-quantile is the value below which a fraction q of the data falls: the 0.5 quantile is the [median](/wiki/median), the 0.25 and 0.75 quantiles are the quartiles, and the 0.01 through 0.99 quantiles are the [percentiles](/wiki/percentile). Quantiles are foundational to descriptive statistics, robust regression, probabilistic forecasting, and modern uncertainty quantification methods such as conformal prediction. They appear throughout machine learning whenever a practitioner needs more than a point estimate, for example when they want a prediction interval, a robust summary, a metric that resists outliers, equal-frequency [feature](/wiki/feature_engineering) buckets, or even an information-efficient way to quantize neural-network weights. [7][16]

The inverse view is just as important: the function that maps a probability q back to the value Q(q) is the **quantile function**, also called the inverse cumulative distribution function (inverse CDF). Sampling the quantile function at evenly spaced probabilities is the basic operation behind quantile bucketing, quantile normalization, and quantile (NF4) quantization. [7][15]

## What is a quantile? (Formal definition)

Let X be a real-valued random variable with [cumulative distribution function](/wiki/cumulative_distribution_function) F(x) = P(X ≤ x). The q-quantile of X, written Q(q), is defined by the generalized inverse of F:

*Q(q) = inf{x ∈ ℝ : F(x) ≥ q},  0 < q < 1.*

When F is strictly increasing and continuous, Q(q) = F⁻¹(q) and the definition reduces to a plain inverse. The function Q is called the *quantile function* or *inverse distribution function*. Two basic properties follow directly from the definition: Q is non-decreasing in q, and F(Q(q)) ≥ q with equality when F is continuous.

For a finite dataset of n observations, the *empirical CDF* F̂ₙ(x) counts the proportion of points at or below x:

*F̂ₙ(x) = (1/n) Σ 𝟙(Xᵢ ≤ x).*

The sample q-quantile is then any value satisfying Q̂(q) = inf{x : F̂ₙ(x) ≥ q}. Because F̂ₙ is a step function, this definition is not unique between order statistics, which is why software packages disagree on sample quantile estimates (see *How are sample quantiles estimated?* below).

## What is the difference between quantiles, quartiles, deciles, and percentiles?

Quartiles, deciles, and percentiles are all quantiles; they differ only in how many equal-sized groups they cut the data into. Quartiles split the data into 4 groups, deciles into 10, and percentiles into 100, so a percentile is a finer-grained quantile than a quartile. The median is the 0.5 quantile, the second quartile (Q2), and the 50th percentile all at once. Several named quantiles appear so often that they have their own labels.

| Quantile | Value of q | Common name | Notes |
|---|---|---|---|
| [Median](/wiki/median) | 0.5 | Second quartile, Q2 | The 50th [percentile](/wiki/percentile); robust measure of central tendency |
| First quartile | 0.25 | Q1, lower quartile | Bottom of the middle 50% |
| Third quartile | 0.75 | Q3, upper quartile | Top of the middle 50% |
| Tercile | 1/3, 2/3 | Lower and upper tercile | Splits data into thirds |
| Quartile | 0.25, 0.5, 0.75 | Q1, Q2, Q3 | Splits data into quarters |
| Quintile | 0.2, 0.4, 0.6, 0.8 | Five equal groups | Often used in income statistics |
| Decile | 0.1, 0.2, ..., 0.9 | Ten equal groups | Common in education and finance |
| Percentile | 0.01, 0.02, ..., 0.99 | Hundred equal groups | Standard in test scoring and growth charts |
| Permille | 0.001, ..., 0.999 | Thousand equal groups | Used in extreme-event analysis |

A derived quantity is the **interquartile range** (IQR), defined as Q3 − Q1. The IQR is a scale measure that ignores the outer 25% of the distribution on each side, which makes it robust to outliers and a standard ingredient of [box plots](/wiki/box_plot). The Tukey rule flags points outside [Q1 − 1.5 IQR, Q3 + 1.5 IQR] as potential outliers and points outside [Q1 − 3 IQR, Q3 + 3 IQR] as far outliers. [11]

## How are sample quantiles estimated?

Given n sorted observations x₍₁₎ ≤ x₍₂₎ ≤ ... ≤ x₍ₙ₎, there is no single right way to estimate Q(q) when nq is not an integer. Hyndman and Fan (1996) catalogued nine distinct definitions used by statistical packages of the time, three based on rounding and six based on linear interpolation between consecutive order statistics. They expressed each definition in a unified form Q̂(p) = (1 − γ) x₍ⱼ₎ + γ x₍ⱼ₊₁₎ and recommended their type 8 estimator, which is approximately median-unbiased regardless of the underlying distribution. [1]

NumPy's `numpy.quantile` and the equivalent `numpy.percentile` follow the same taxonomy. The default since NumPy 1.22 is `linear`, which corresponds to Hyndman and Fan's type 7. The full list of methods supported through the `method` argument: [7]

| NumPy method | Behaviour | Hyndman-Fan type |
|---|---|---|
| `linear` (default) | Linear interpolation between order statistics, weight = fraction | Type 7 |
| `lower` | Returns x₍ᵢ₎ where i = floor((n−1)q) + 1 | Type 1 (rounding down) |
| `higher` | Returns x₍ⱼ₎ where j = ceil((n−1)q) + 1 | Discontinuous |
| `nearest` | Returns whichever of x₍ᵢ₎ or x₍ⱼ₎ is closer | Type 3 (round to even) |
| `midpoint` | Returns (x₍ᵢ₎ + x₍ⱼ₎) / 2 | Discontinuous |
| `averaged_inverted_cdf` | Average of the inverse CDF and one-sided variants | Type 2 |
| `closest_observation` | Variant of nearest that breaks ties downward | Type 3 |
| `interpolated_inverted_cdf` | Linear interpolation of the inverse empirical CDF | Type 4 |
| `hazen` | Continuous interpolation centred at (i − 0.5)/n | Type 5 |
| `weibull` | Plotting position estimate, weight i/(n+1) | Type 6 |
| `median_unbiased` | Approximately median-unbiased for any distribution | Type 8 |
| `normal_unbiased` | Approximately unbiased for normal data | Type 9 |

R's `quantile()`, SAS, Stata, Excel, and SciPy all expose a similar choice of estimator. R defaults to type 7, the same as NumPy. Different defaults can produce visibly different answers, especially for small samples or extreme quantiles, which is why reproducible research notes the convention used.

## Worked example

Consider the dataset {2, 4, 4, 5, 7, 9, 10, 12, 13, 15} sorted with n = 10. To compute the 0.25 and 0.75 quantiles using the type 7 (NumPy default) rule, the index is h = (n − 1) q + 1.

For q = 0.25, h = 9 · 0.25 + 1 = 3.25, so Q(0.25) = x₍₃₎ + 0.25 · (x₍₄₎ − x₍₃₎) = 4 + 0.25 · (5 − 4) = 4.25.

For q = 0.75, h = 9 · 0.75 + 1 = 7.75, so Q(0.75) = x₍₇₎ + 0.75 · (x₍₈₎ − x₍₇₎) = 10 + 0.75 · (12 − 10) = 11.5.

The IQR is then 11.5 − 4.25 = 7.25, and the median is the average of x₍₅₎ and x₍₆₎, which gives 8. A different convention, say type 4, would round these values to nearby order statistics and produce slightly different results.

## What is a quantile-quantile (Q-Q) plot?

A **Q-Q plot** is a graphical tool that compares two distributions by plotting their quantiles against each other. The simplest case plots empirical sample quantiles against theoretical quantiles of a reference distribution, often the standard normal. Each point has the form (theoretical quantile at p, sample quantile at p) for a grid of p values. If the two distributions match up to a linear transformation, the points fall on a straight line. Curvature in the tails reveals skewness, S-shaped patterns indicate heavier or lighter tails than the reference, and isolated outlying points highlight individual extreme observations.

Q-Q plots are widely used in regression diagnostics to check the normality of residuals, in finance to compare asset return distributions, and in genomics to detect inflation in p-value distributions from genome-wide association studies. They are usually preferred over histograms or kernel density plots when the question is about distributional shape rather than density.

## Box plots

[Box plots](/wiki/box_plot) summarise a distribution using five quantile-based numbers: the minimum, Q1, the median, Q3, and the maximum. The central box spans Q1 to Q3 and the line inside it marks the median. Whiskers extend from the box to the most extreme data point within 1.5 IQR of the nearest quartile, and points beyond the whiskers are drawn individually as suspected outliers. Box plots, introduced by John Tukey in 1977, are a compact way to compare distributions across groups and have become standard in exploratory data analysis. [11]

## What is quantile regression?

Linear regression estimates the conditional mean E[Y | X = x]. **Quantile regression**, introduced by Roger [Koenker](/wiki/koenker) and Gilbert Bassett Jr. in their 1978 *Econometrica* paper *Regression Quantiles*, instead estimates the conditional q-quantile Qᵧ(q | X = x). Where ordinary least squares minimises squared residuals, quantile regression minimises the asymmetric **pinball loss**, also called check loss: [2]

*Lᵧ(y, ŷ) = max(q (y − ŷ), (q − 1) (y − ŷ)).*

Equivalently, the loss penalises positive residuals with weight q and negative residuals with weight 1 − q. At q = 0.5 it reduces to half the absolute error, so median regression is also called least absolute deviations (LAD) regression. The optimisation problem is a linear program and can be solved with simplex or interior-point methods.

Quantile regression has several practical advantages over mean regression. It is robust to outliers in Y because it uses absolute rather than squared losses. It naturally captures heteroscedasticity, since fitting the 0.1 and 0.9 quantiles separately produces a prediction interval that widens where the data are noisier. It does not require normality of errors and gives a fuller picture of the conditional distribution, not just its centre. The downside is that fitting many quantiles separately can produce non-monotone curves, a problem addressed by techniques such as quantile crossing penalties or simultaneous quantile estimation.

Major implementations include `quantreg` in R (maintained by Koenker), `statsmodels.regression.quantile_regression.QuantReg` and [scikit-learn](/wiki/scikit-learn)'s `QuantileRegressor` in Python, and `quantile` objectives in gradient boosting libraries LightGBM, XGBoost, and CatBoost. [14] The gradient boosting variants fit a separate model per requested quantile and are widely used in retail forecasting and energy demand prediction.

## How are quantiles used in probabilistic forecasting?

Many modern time-series neural networks output multiple quantiles simultaneously rather than a point forecast. Amazon's *DeepAR* model, published by Salinas et al. in 2017, parameterises a probability distribution at each forecast step and supports both Gaussian and quantile-output variants. [9] Google's *Temporal Fusion Transformer* (TFT, Lim et al. 2019) trains a Transformer-based architecture by minimising the [pinball loss](/wiki/pinball_loss) summed across a chosen set of quantiles, often {0.1, 0.5, 0.9}. [8] The result is a calibrated set of prediction quantiles that can be turned into prediction intervals, point forecasts (the median), or shortfall estimates without retraining.

The M5 forecasting competition on Walmart sales had a dedicated *uncertainty* track judged on the weighted scaled pinball loss across nine quantiles. Top entries combined gradient boosting models with custom quantile reconciliation across the product hierarchy, and the competition cemented quantile prediction as the default interface for [probabilistic forecasting](/wiki/probabilistic_forecasting) in industry.

## How does conformal prediction use quantiles?

**Conformal prediction**, developed by Vladimir Vovk, Alex Gammerman, and Glenn Shafer in the early 2000s, builds prediction sets that are guaranteed to contain the true response with a user-chosen probability, regardless of the underlying data distribution, as long as the data are exchangeable. [6] The core construction takes a base predictor, defines a *nonconformity score* on a held-out calibration set, and uses a quantile of those scores to set the prediction set's threshold. Concretely, for a target miscoverage rate α, the threshold is the ⌈(n+1)(1−α)⌉ / n quantile of the calibration scores.

For regression, the simplest variant uses absolute residuals as nonconformity scores and produces a constant-width interval around any point prediction. Romano, Patterson, and Candès introduced *conformalized quantile regression* (CQR) at NeurIPS 2019, which uses an underlying quantile regression model to predict an initial interval [ŷ₍α∕₂₎, ŷ₍₁₋α∕₂₎] and then conformalises it using a quantile of signed gap scores on the calibration set. [4] The result is an adaptive interval that widens where the data are noisier yet still enjoys finite-sample marginal coverage.

The definitive practitioner reference is the tutorial *A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification* by Anastasios Angelopoulos and Stephen Bates, published as a Foundations and Trends monograph in 2023. [5] Their open-source repository implements [conformal prediction](/wiki/conformal_prediction) on real datasets and has helped drive adoption in industry.

## How are quantiles used as risk metrics in finance?

In quantitative finance, quantiles of the loss distribution underpin two of the most widely used risk metrics. **Value at Risk** (VaR) at level α is the (1 − α)-quantile of the portfolio loss distribution over a fixed horizon, often one or ten trading days. A 99% one-day VaR of $10 million means losses are expected to exceed $10 million on no more than 1% of days. Regulators including the Basel Committee on Banking Supervision require banks to compute [value at risk](/wiki/value_at_risk) for market-risk capital charges. [12]

**Expected Shortfall** (ES), also called Conditional VaR (CVaR), Average VaR, or Expected Tail Loss, is the conditional mean of the loss given that the loss exceeds VaR. ES at level α averages the worst 100α% of outcomes and is therefore sensitive to the tail shape, while VaR is not. The Basel III framework adopted ES at the 97.5% level as the regulatory replacement for 99% VaR for trading-book capital, partly because ES is *coherent* (subadditive) while VaR is not. [12]

## Quantile normalization

In genomics, *quantile normalization* (Bolstad et al., 2003, *Bioinformatics*) makes the distribution of values in many samples identical by replacing each value with the average of all values that share its rank. [10] The procedure is: sort each column independently, take the row means of the sorted matrix, and then write those means back into the original positions. The result is that every column has the same empirical distribution while the rank order within each column is preserved. Quantile normalization became a standard preprocessing step for Affymetrix microarrays through the RMA pipeline and has since been extended to RNA-seq and methylation arrays.

## How are quantiles used in machine learning?

Beyond regression and forecasting, quantiles power several standard preprocessing and modeling recipes in machine learning.

**Quantile bucketing (binning).** In [feature engineering](/wiki/feature_engineering), quantile bucketing replaces a continuous feature with categorical bins of equal frequency instead of equal width, which can stabilise tree models and keeps every bin populated even when the raw feature is skewed. Google's Machine Learning Crash Course describes the technique directly: "Quantile bucketing creates bucketing boundaries such that the number of examples in each bucket is exactly or nearly equal." [16] The recipe is implemented as `strategy='quantile'` in scikit-learn's `KBinsDiscretizer` and is documented on the dedicated [quantile bucketing](/wiki/quantile_bucketing) page.

**Robust scaling and Winsorisation.** Robust scaling subtracts the median and divides by the IQR rather than using mean and standard deviation, which gives a feature scale that is insensitive to outliers (scikit-learn's `RobustScaler`). Winsorisation clips extreme values at chosen quantiles, for example replacing everything below the 1st percentile with the 1st-percentile value.

**Quantile transforms.** A quantile transform maps any feature distribution to a uniform or Gaussian distribution by passing it through its empirical CDF, a form of non-linear [normalization](/wiki/normalization) that is useful before models that assume Gaussian inputs (scikit-learn's `QuantileTransformer`).

**Distributional reinforcement learning.** In reinforcement learning, distributional methods such as QR-DQN (Dabney et al., 2018) represent the value function as a discrete set of quantile estimates rather than a single expectation, which improves stability and lets the agent reason about return distributions. [13]

## How is quantile quantization (NF4) used in large language models?

Quantiles also drive a state-of-the-art weight [quantization](/wiki/quantization) scheme for large language models. **4-bit NormalFloat (NF4)**, introduced in the [QLoRA](/wiki/qlora) paper by Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer (2023), is a *quantile quantization* data type: its 16 representable levels are placed at quantiles of a standard normal distribution rather than at uniformly spaced integers. [15] Because the weights of a pretrained network are roughly zero-mean Gaussian, putting more levels where the density is high (near zero) and fewer in the tails minimises the expected quantization error for a fixed bit budget. The exact level locations are computed by evaluating the quantile function (inverse CDF) of the normal distribution at equal-area probabilities, then normalising the result into the range [-1, 1].

The authors describe NF4 as "a new data type that is information theoretically optimal for normally distributed weights," and the technique is what lets QLoRA "finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance." [15] NF4 is paired with double quantization (quantizing the quantization constants) and a block size of 64 weights per scaling factor, and it ships in the bitsandbytes library used by Hugging Face PEFT. It is a concrete example of the quantile function being used not for analysis but for data compression: the same inverse-CDF idea behind quantile bucketing reused to lay out floating-point codes.

## References

1. Hyndman, R. J., & Fan, Y. (1996). Sample Quantiles in Statistical Packages. *The American Statistician*, 50(4), 361-365. https://www.tandfonline.com/doi/abs/10.1080/00031305.1996.10473566
2. Koenker, R., & Bassett, G. (1978). Regression Quantiles. *Econometrica*, 46(1), 33-50. https://www.econometricsociety.org/publications/econometrica/1978/01/01/regression-quantiles
3. Koenker, R. (2005). *Quantile Regression*. Cambridge University Press.
4. Romano, Y., Patterson, E., & Candès, E. J. (2019). Conformalized Quantile Regression. *Advances in Neural Information Processing Systems*, 32, 3538-3548. https://arxiv.org/abs/1905.03222
5. Angelopoulos, A. N., & Bates, S. (2023). Conformal Prediction: A Gentle Introduction. *Foundations and Trends in Machine Learning*, 16(4), 494-591. https://arxiv.org/abs/2107.07511
6. Vovk, V., Gammerman, A., & Shafer, G. (2005). *Algorithmic Learning in a Random World*. Springer.
7. NumPy Developers. numpy.quantile reference documentation. https://numpy.org/doc/stable/reference/generated/numpy.quantile.html
8. Lim, B., Arık, S. Ö., Loeff, N., & Pfister, T. (2019). Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. *International Journal of Forecasting*. https://arxiv.org/abs/1912.09363
9. Salinas, D., Flunkert, V., & Gasthaus, J. (2017). DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. *International Journal of Forecasting*. https://arxiv.org/abs/1704.04110
10. Bolstad, B. M., Irizarry, R. A., Astrand, M., & Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. *Bioinformatics*, 19(2), 185-193. https://academic.oup.com/bioinformatics/article/19/2/185/372664
11. Tukey, J. W. (1977). *Exploratory Data Analysis*. Addison-Wesley.
12. Basel Committee on Banking Supervision (2019). *Minimum capital requirements for market risk*. Bank for International Settlements. https://www.bis.org/bcbs/publ/d457.htm
13. Dabney, W., Rowland, M., Bellemare, M. G., & Munos, R. (2018). Distributional Reinforcement Learning with Quantile Regression. *AAAI Conference on Artificial Intelligence*. https://arxiv.org/abs/1710.10044
14. scikit-learn developers. QuantileRegressor documentation. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.QuantileRegressor.html
15. Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. *Advances in Neural Information Processing Systems 36*. https://arxiv.org/abs/2305.14314
16. Google for Developers. Numerical data: Binning (quantile bucketing). Machine Learning Crash Course. https://developers.google.com/machine-learning/crash-course/numerical-data/binning

## Explain Like I'm 5 (ELI5)

Imagine you have a bag of differently colored candies. If you want to divide the candies into equal parts, you can use the concept of quantiles. For example, if you want to divide them into 4 equal parts, you would use quartiles (like cutting a cake into 4 pieces). If you want to divide them into 100 equal parts, you would use percentiles.

In machine learning, quantiles help us understand and organize data better. We use them to clean up data, create new information, and make decisions. They also help us understand how different factors might affect an outcome.

