# Non-Response Bias

> Source: https://aiwiki.ai/wiki/non-response_bias
> Updated: 2026-06-28
> Categories: Data & Datasets, Machine Learning, Statistics
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

Non-response bias is the error that arises when the people or units that do not respond to a survey, study, or data collection process differ systematically from those that do, so that estimates built from the respondents alone do not reflect the full target population. It is a type of [selection bias](/wiki/selection_bias): the bias equals the non-response rate multiplied by the difference between respondents and non-respondents on the quantity being measured, which means a high non-response rate by itself does not guarantee a large bias, and a low rate does not guarantee a small one. A landmark meta-analysis by Robert Groves and Emilia Peytcheva (2008), covering 959 nonresponse-bias estimates drawn from 59 methodological studies, found only a weak correlation (around 0.2) between a survey's nonresponse rate and the magnitude of its nonresponse bias.[1]

In [machine learning](/wiki/machine_learning) and statistics, this bias arises when [training](/wiki/training) data, survey responses, or experimental observations are incomplete because certain groups or types of observations are missing from the [dataset](/wiki/data_set_or_dataset). The result is that estimates, models, or conclusions drawn from the available data do not accurately reflect the target population. Non-response bias has been recognized as a persistent challenge across disciplines, from opinion polling and public health research to [recommendation systems](/wiki/recommender_system) and [natural language processing](/wiki/natural_language_processing), and it now matters directly for AI because survey corpora, opt-in datasets, and human-feedback rater pools all inherit it. When left unaddressed, it can lead to systematically distorted predictions, unfair algorithmic outcomes, and flawed policy decisions.

## ELI5 (Explain like I'm 5)

Imagine you ask everyone in your class what their favorite ice cream flavor is, but the kids who love chocolate are all out playing at recess and never answer your question. You look at your results and think, "Nobody likes chocolate!" But that is wrong. The chocolate lovers just did not respond. Non-response bias is what happens when the people who do not answer are different from the people who do answer, making your results lopsided.

In machine learning, the same thing happens with data. If a computer learns from data that is missing certain types of examples, it will make mistakes when it encounters those missing types in the real world.

## What is non-response bias?

Non-response bias is the systematic error introduced into a statistic when non-respondents differ from respondents on the variable being estimated. It is distinct from the simple loss of sample size: even a study that loses many participants can be unbiased if the people who drop out are just like the people who stay, while a study that loses only a few participants can be badly biased if those few are unusual on the measured outcome. Non-response bias can be expressed using a simple formula. Let the true population parameter of interest be $\theta$, and let $\hat{\theta}$ be the estimate obtained from respondents only. The bias is:

$$\text{Bias} = \hat{\theta} - \theta = \frac{n_{nr}}{n}(\bar{Y}_r - \bar{Y}_{nr})$$

where $n$ is the total sample size, $n_{nr}$ is the number of non-respondents, $\bar{Y}_r$ is the mean of the variable among respondents, and $\bar{Y}_{nr}$ is the mean among non-respondents. Two conditions must hold simultaneously for non-response bias to occur: (1) the non-response rate $n_{nr}/n$ must be non-trivial, and (2) non-respondents must differ from respondents on the variable of interest ($\bar{Y}_r \neq \bar{Y}_{nr}$). If either component is zero, the bias vanishes. This decomposition is why methodologists stress that the non-response rate is only one of two multiplicative factors, never the whole story.[1][8]

## Does a high non-response rate mean high bias?

No. A high non-response rate is a necessary condition for large non-response bias but not a sufficient one, because bias also depends on how different non-respondents are from respondents. The most cited evidence for this is the meta-analysis by Robert Groves and Emilia Peytcheva, "The Impact of Nonresponse Rates on Nonresponse Bias: A Meta-Analysis," published in Public Opinion Quarterly in 2008. The authors assembled 959 univariate nonresponse-bias estimates from 59 methodological studies spanning diverse topics and target populations (US national samples, physicians, university students, company customers, and others). They reported that the response rate explains very little of the variation in nonresponse bias, finding only a weak correlation of roughly 0.2 between a survey's response rate and the absolute relative nonresponse bias of its estimates, and that most of the variation in bias occurred across estimates within the same survey rather than between surveys with different response rates.[1]

In their words, the studies "permit exploration of which circumstances produce a relationship between nonresponse rates and nonresponse bias and which, do not," and the practical conclusion is that response rate alone is a poor proxy for data quality.[1] This finding reshaped survey practice: it shifted the field away from chasing ever-higher response rates as the sole quality target and toward directly investigating whether the response mechanism is correlated with the variables of interest. The same logic carries into machine learning. A large opt-in dataset with a low effective response rate may still be adequate for a target if the missing population does not differ on the labels being predicted, while a dataset with high coverage can still be biased on the few dimensions where non-participants differ sharply.

## Formal definition

Non-response bias can be expressed using the bias decomposition shown above. The same expression generalizes to any estimator, not just a sample mean: whenever the propensity to be observed is correlated with the outcome, the resulting estimate is biased by the product of the missingness fraction and that correlation. This is why two studies with identical response rates can have completely different levels of bias, and why reporting the response rate without any evidence about the non-respondents is uninformative about accuracy.[1][8]

## Types of non-response

Non-response manifests in two distinct forms, each with different implications for analysis.

| Type | Description | Example | Typical impact |
|------|-------------|---------|----------------|
| **Unit non-response** | An entire observation or subject is absent from the dataset. The sampled individual cannot be contacted, refuses to participate, or is otherwise unreachable. | A patient drops out of a clinical trial before follow-up measurements are taken. | Reduces effective sample size and can shift population estimates if dropouts differ systematically from completers. |
| **Item non-response** | A subject participates but fails to provide answers for specific variables or questions. | A survey respondent skips the income question but answers all other items. | Creates partial missing data patterns; simpler to address than unit non-response if few items are missing. |

Unit non-response means the whole record is gone and there is no information about that subject except whatever the sampling frame contained; item non-response means the record exists but one or more fields are blank, so the other answers from that same respondent can be used to model or impute the missing field. In machine learning contexts, unit non-response corresponds to entire records missing from a [training set](/wiki/training_set), while item non-response corresponds to individual [features](/wiki/feature) or [labels](/wiki/label) being absent within otherwise complete records.

## Missing data mechanisms

Donald Rubin introduced a formal taxonomy for missing data in his 1976 paper "Inference and Missing Data," published in *Biometrika*. This framework classifies the reasons behind missing values into three categories, each with different consequences for statistical analysis and [model](/wiki/model) training.[2]

### Missing completely at random (MCAR)

Data are missing completely at random when the probability of a value being missing is entirely independent of both the observed and unobserved data. Under MCAR, the missing data represent a simple random subsample of the full data. Complete-case analysis (listwise deletion) remains unbiased under MCAR, though it sacrifices statistical power by discarding incomplete cases.

MCAR is the strongest and most convenient assumption, but it is also the least realistic in practice. A diagnostic check called Little's MCAR test (proposed by Roderick Little in 1988) compares observed variable means across different missing data patterns using the [expectation-maximization](/wiki/expectation_maximization) algorithm. A significant test result (p < 0.05) suggests the data are not MCAR, though the test has known limitations: it assumes multivariate normality, has low statistical power with few variables, and cannot distinguish between MAR and MNAR.[5]

### Missing at random (MAR)

Data are missing at random when the probability of missingness depends on observed variables but not on the missing values themselves, after conditioning on the observed data. For example, younger respondents may be less likely to answer a health survey, but among people of the same age, the probability of responding does not depend on their actual health status.

MAR is a weaker assumption than MCAR and cannot be tested directly from the data alone; it requires substantive judgment about the data-generating process. Under MAR, standard methods like [multiple imputation](/wiki/multiple_imputation) and maximum likelihood estimation produce consistent, unbiased estimates when the analysis model is correctly specified. Most modern missing-data methods assume MAR as a working assumption.

### Missing not at random (MNAR)

Data are missing not at random (also called nonignorable nonresponse) when the probability of a value being missing depends on the unobserved value itself. A classic example is income surveys where high earners are more likely to skip the income question precisely because of their high income. In clinical trials, patients experiencing severe side effects may drop out, and their unobserved outcomes differ from those who remain.

MNAR is the most difficult scenario to handle because the mechanism that drives missingness is entangled with the quantity being estimated. Standard imputation and likelihood methods are biased under MNAR unless the missing data model is explicitly specified. Approaches to MNAR data include selection models (such as the Heckman correction), pattern-mixture models, and sensitivity analyses that explore how conclusions change under varying assumptions about the missing data.

| Mechanism | Probability of missingness depends on | Testable? | Suitable methods |
|-----------|---------------------------------------|-----------|------------------|
| MCAR | Neither observed nor unobserved data | Partially (Little's test) | Complete-case analysis, any imputation method |
| MAR | Observed data only | No (requires domain knowledge) | [Multiple imputation](/wiki/multiple_imputation), maximum likelihood, inverse probability weighting |
| MNAR | Unobserved (missing) values | No | Selection models, pattern-mixture models, sensitivity analysis |

## Causes of non-response bias

Non-response bias arises from a wide range of mechanisms across different data collection contexts.

### Survey and observational research

- **Refusal to participate.** Individuals may decline to take part in a study for personal, cultural, or logistical reasons. People who refuse often differ from participants on key variables such as health status, political views, or socioeconomic position.
- **Inability to contact.** Some sampled individuals are unreachable due to incorrect contact information, geographic mobility, or institutional barriers (for example, incarcerated or hospitalized populations).
- **Topic sensitivity.** Questions about income, substance use, criminal behavior, or sexual health are more likely to be skipped or to deter participation altogether.
- **Survey fatigue.** Long or repetitive surveys produce higher item non-response and dropout rates, particularly toward the end of the instrument.
- **Mode effects.** Different survey modes (mail, phone, web, in-person) attract different types of respondents. Web surveys, for instance, systematically exclude populations without internet access.

### Machine learning and data science

- **Self-selection in data generation.** User-generated data (reviews, ratings, social media posts) is produced disproportionately by certain demographics. Platforms collect data only from their user base, which may not represent the broader population.
- **Sensor and device limitations.** IoT devices, medical monitors, or tracking systems may fail to record data from certain environments or conditions, creating systematic gaps.
- **Label availability.** In supervised learning, obtaining labels for all examples may be impractical. Unlabeled examples are effectively "non-responses" that may not be random; for instance, ambiguous cases are harder to label and may be left out.
- **Survivorship bias in datasets.** Datasets may only contain examples that survived some filtering process. A loan default [prediction](/wiki/prediction) model trained only on approved applications has no data on applicants who were denied, yet denied applicants may have been the most informative cases.
- **Platform-specific collection.** [Recommender systems](/wiki/recommender_system) trained on implicit feedback (clicks, views, purchases) observe interactions only for items that were shown to users. Items never surfaced to a user produce no signal, and this absence is missing not at random because the platform's own algorithms determined what was displayed.
- **Opt-in and rater-pool selection.** Datasets assembled from volunteers, crowdworkers, or paid annotators are produced only by people who chose to participate, so the resulting labels reflect that self-selected pool rather than the deployment population.

## How does non-response bias affect AI training data and RLHF?

Non-response bias enters modern AI through three channels: the corpora used for pretraining, the opt-in datasets used for fine-tuning, and the human-feedback rater pools used for alignment. Web-scraped and user-generated text over-represents populations that post online and under-represents those who do not, so a [large language model](/wiki/large_language_model) trained on it inherits the participation patterns of internet users rather than the general public. Opt-in datasets, in which contributors volunteer their data, are a textbook source of self-selection: the contributors are by definition not a random sample of the target population.

The effect is sharpest in [reinforcement learning from human feedback](/wiki/reinforcement_learning_from_human_feedback) (RLHF), where a [reward model](/wiki/reward_model) is trained on preference comparisons supplied by a finite pool of human raters. If that pool is recruited from a narrow demographic, region, or professional background, the reward model encodes the preferences and blind spots of that group, and any perspective absent from the rater pool is effectively a non-response that the model never learns to value. The composition of the rater pool, including demographics, domain expertise, and fatigue, shapes what the reward model rewards, which is a direct analogue of the survey-methodology insight that who responds determines what the data say. Practitioners mitigate this by diversifying and auditing annotator pools, documenting rater demographics, and re-weighting or aggregating preferences so that no single subgroup dominates the learned reward.[12]

## Real-world examples and consequences

Non-response bias has produced notable failures across multiple fields.

### Election polling

The 2016 and 2020 U.S. presidential elections exposed significant non-response bias in pre-election polling. In 2016, white voters without a college degree were less likely to participate in polls, causing many forecasts to underestimate support for Donald Trump. In 2020, polls again underestimated Trump's support, with post-election analyses suggesting that voters with low social trust systematically avoided participating in surveys. Historical examples date further back: the 1936 Literary Digest poll famously predicted Alf Landon would defeat Franklin Roosevelt, largely because the poll's sample (drawn from telephone directories and automobile registrations during the Great Depression) systematically excluded lower-income voters who favored Roosevelt.

### Public health and epidemiology

Health surveys consistently find that non-respondents tend to have worse health outcomes than respondents. The Belgian National Health Interview Survey (response rate 61.4%) estimated 19% lower prevalence of poor health compared to the Belgian census (response rate 96.5%).[10] In cardiovascular follow-up studies, females, older individuals, and those with higher education are more likely to participate in postal surveys, leading to underestimated health risks in the broader population. The National Health and Nutrition Examination Survey (NHANES III) found that non-respondents to glucose intolerance testing were 59% more likely to report fair or poor health than respondents.

### Clinical trials

The National Research Council, at the request of the U.S. Food and Drug Administration (FDA), issued a 2010 report titled "The Prevention and Treatment of Missing Data in Clinical Trials," noting that missing data had seriously compromised inferences from multiple clinical trials.[7] Patients who experience adverse effects are more likely to withdraw, creating MNAR data that biases estimates of treatment efficacy upward and safety risks downward.

### Recommender systems

In [collaborative filtering](/wiki/collaborative_filtering), users rate only items they choose to consume, producing ratings that are missing not at random.[11] Popular items receive disproportionately more ratings, and users tend to rate items they already expect to enjoy. This selection pattern biases recommendation models toward popular content and away from niche items, amplifying popularity bias and reducing the diversity of recommendations.

## Impact on machine learning

Non-response bias affects machine learning systems at every stage of the pipeline.

### Biased training data

When the [training set](/wiki/training_set) is not representative of the deployment population, a model learns patterns that apply to the observed subset but not the target distribution. This distribution shift between training data and real-world data is a fundamental source of poor [generalization](/wiki/generalization). For example, a medical diagnostic model trained primarily on data from urban hospitals may perform poorly on patients from rural areas who were underrepresented in the training data.

### Misleading evaluation metrics

If the [test set](/wiki/test_set) and [validation set](/wiki/validation_set) share the same non-response patterns as the training data, standard [accuracy](/wiki/accuracy), [precision](/wiki/precision), and [recall](/wiki/recall) metrics will not reveal the model's true performance on the full population. A model can achieve high test accuracy while systematically failing on subpopulations absent from the evaluation data.

### Fairness and equity

Non-response bias is a direct driver of algorithmic unfairness. When certain demographic groups are underrepresented in training data due to non-response, models tend to perform worse for those groups. This can perpetuate or amplify existing disparities. In hiring algorithms, credit scoring, and criminal risk assessment tools, non-response bias in historical data has led to discriminatory outcomes. The relationship between non-response bias and [bias in AI ethics and fairness](/wiki/bias_ethics_fairness) is well documented.[12]

### Feedback loops

In deployed systems, non-response bias can create self-reinforcing feedback loops. A recommendation engine that underserves a population segment due to missing data will generate fewer interactions from that segment, producing even less data about them in future training rounds. This cycle progressively worsens representation over time.

## How do you detect non-response bias?

Identifying non-response bias requires both statistical tests and domain expertise, because the bias depends on the unobserved non-respondents, who by definition cannot be measured directly. The standard strategy is to find any auxiliary information about the people or units that did not respond and check whether it differs from the respondents.

### Comparing respondents and non-respondents

When auxiliary information is available about non-respondents (from administrative records, sampling frames, or census data), researchers can compare the two groups on known characteristics. Significant differences suggest, but do not prove, the presence of non-response bias.

### Wave analysis

The continuum of resistance model posits that late respondents (those who require multiple follow-up contacts before responding) resemble non-respondents more closely than early respondents do. Successive wave analysis compares responses across contact attempts to estimate the direction and magnitude of potential bias.

### Little's MCAR test

As described in the missing data mechanisms section, Little's test provides a formal statistical test of the MCAR assumption. A significant result indicates the data are not MCAR, prompting the use of more sophisticated handling methods.[5]

### Visualization and pattern analysis

Missing data patterns can be visualized using matrix plots, heatmaps, or dendrograms that display which variables and observations have missing values. Systematic patterns (for example, variables that are always missing together) suggest non-random missingness.

### Benchmarking against known populations

Comparing sample demographics to known population distributions (from census data, administrative records, or prior large-scale studies) reveals whether certain groups are underrepresented relative to expectations.

## How do you correct non-response bias?

A range of techniques have been developed to prevent, reduce, or correct for non-response bias. These approaches span study design, weighting, imputation, and modeling, and the right choice depends on whether the missing data are plausibly MCAR, MAR, or MNAR.

### Prevention through study design

| Strategy | Description | Effectiveness |
|----------|-------------|---------------|
| Shorter instruments | Reducing survey length decreases respondent burden and dropout rates | High for item non-response |
| Incentives | Monetary or non-monetary rewards increase participation rates | Moderate to high; effect varies by population |
| Multiple contact modes | Combining web, phone, mail, and in-person approaches reaches different populations | High for unit non-response |
| Follow-up reminders | Successive contacts convert initial non-respondents into respondents | High; diminishing returns after 3-4 contacts |
| Pilot testing | Identifying problematic questions before deployment | Moderate; prevents avoidable item non-response |

### Weighting adjustments

**Inverse probability weighting (IPW)** assigns each respondent a weight equal to the inverse of their estimated probability of responding. This approach, rooted in the Horvitz-Thompson estimator from survey sampling, inflates the contribution of respondents who resemble non-respondents. IPW produces unbiased estimates under MAR when the response propensity model is correctly specified. However, extreme weights (when some estimated response probabilities are near zero) can make estimates unstable, and weight trimming or stabilization techniques are often applied.

**Post-stratification and raking** adjust sample weights so that the weighted sample matches known population marginals (for example, age, gender, and education distributions from census data). These methods correct for non-response bias to the extent that the non-response mechanism operates through the stratifying variables.

### Imputation methods

Imputation replaces missing values with estimated values to create a complete dataset for analysis. The choice of imputation method has significant consequences for the validity of downstream results.

| Method | Description | Strengths | Weaknesses |
|--------|-------------|-----------|------------|
| Mean/median imputation | Replaces missing values with the variable's observed mean or median | Simple to implement | Underestimates variance; biased unless MCAR; distorts correlations |
| Hot-deck imputation | Replaces missing values with observed values from similar respondents | Preserves marginal distribution | Does not account for imputation uncertainty |
| Regression imputation | Predicts missing values from a [regression](/wiki/regression) model using observed variables | Uses available information efficiently | Overstates correlations; underestimates variance |
| KNN imputation | Uses K nearest neighbors (based on observed features) to estimate missing values | Non-parametric; handles complex relationships | Computationally expensive for large datasets; sensitive to distance metric choice |
| Multiple imputation (MI) | Generates M (typically 5-100) plausible values for each missing observation, creating M complete datasets that are analyzed separately and pooled | Properly accounts for imputation uncertainty via Rubin's rules; valid under MAR | Requires careful model specification; computationally intensive |
| MICE | Multiple Imputation by Chained Equations; iteratively imputes each variable conditional on all others | Flexible; handles mixed variable types; widely available in R and Python | Lacks formal theoretical convergence guarantee; results can depend on imputation order |
| Expectation-maximization (EM) | Iteratively estimates parameters by computing expected sufficient statistics (E-step) and maximizing the likelihood (M-step) | Produces maximum likelihood estimates; computationally efficient | Point estimates only (no uncertainty quantification without additional procedures) |
| Random forest imputation (MissForest) | Uses [random forest](/wiki/random_forest) models to predict missing values iteratively | Handles nonlinear relationships and interactions; works with mixed data types | Computationally expensive; may overfit with small samples |

Rubin's 1987 book "Multiple Imputation for Nonresponse in Surveys" established the theoretical foundation for MI and provided the pooling rules (now called Rubin's rules) for combining results across imputed datasets.[4] Research has shown that even a small number of imputations (five or fewer) substantially improves estimation quality, though contemporary recommendations suggest 20 to 100 imputations for better performance.[9]

### Model-based approaches

**Maximum likelihood estimation** under missing data uses full information maximum likelihood (FIML) to estimate model parameters directly from the incomplete data without explicitly imputing missing values. FIML produces asymptotically unbiased and efficient estimates under MAR.

**The Heckman selection model** (Heckman correction) addresses sample selection bias by jointly modeling the outcome of interest and the selection process that determines which observations are observed. Originally developed in econometrics by James Heckman (1979) to study female labor force participation, the two-step procedure first estimates the probability of being observed (selection equation) and then incorporates this information into the outcome equation via an inverse Mills ratio. Heckman received the Nobel Prize in Economics in 2000 partly for this work.[3]

**Pattern-mixture models** stratify the data by missing data patterns and estimate the outcome distribution separately within each pattern. These models are particularly useful for sensitivity analysis under MNAR assumptions, as they allow the analyst to specify different distributional assumptions for unobserved data.

### Sensitivity analysis

Because the MAR assumption cannot be verified from data alone, sensitivity analysis explores how conclusions change under departures from MAR. Tipping-point analysis is a widely used approach: after imputing under MAR, a shift parameter (delta) is added to the imputed values and progressively increased until the study's conclusion is overturned. If the required shift is implausibly large, the original conclusion is considered robust to violations of the MAR assumption. Regulatory agencies, including the FDA, recommend sensitivity analyses as a standard component of clinical trial reporting when missing data is present.[7]

### Machine learning-specific strategies

- **Stratified sampling and [oversampling](/wiki/oversampling).** When building training sets, stratified sampling ensures proportional representation of subgroups. Oversampling techniques (such as SMOTE for [class-imbalanced datasets](/wiki/class-imbalanced_dataset)) can be adapted to address non-response-driven underrepresentation.
- **[Data augmentation](/wiki/data_augmentation).** Synthetic data generation can supplement underrepresented groups, though care must be taken not to introduce unrealistic patterns.
- **Domain adaptation and [transfer learning](/wiki/transfer_learning).** When training data suffers from non-response bias, models pre-trained on more representative datasets can transfer learned representations to the biased setting.
- **Fairness-aware modeling.** Techniques like adversarial debiasing, reweighting, and counterfactual fairness constraints explicitly account for representation gaps during model training. MinDiff and Counterfactual Logit Pairing (available in the TensorFlow Model Remediation Library) are examples of in-training bias mitigation.
- **Propensity-based correction for recommender systems.** Inverse propensity scoring (IPS) re-weights observed interactions by the inverse probability of the item having been shown to the user, correcting for the MNAR nature of implicit feedback data.[11]

## Relationship to other biases

Non-response bias is one member of a family of related biases. Understanding the distinctions and overlaps helps practitioners identify the correct mitigation strategy.

| Bias type | Relationship to non-response bias |
|-----------|-----------------------------------|
| [Selection bias](/wiki/selection_bias) | Non-response bias is a subtype of selection bias. Selection bias is the broader category that includes any systematic difference between the study sample and the target population. |
| [Sampling bias](/wiki/sampling_bias) | Arises from the initial sample design (for example, using a non-random sampling frame). Non-response bias can occur even with a perfectly designed random sample if certain individuals do not participate. |
| [Coverage bias](/wiki/coverage_bias) | Occurs when the sampling frame does not include the full target population (for example, phone surveys exclude people without phones). Coverage bias operates at the frame level; non-response bias operates at the participation level. |
| [Convenience sampling](/wiki/convenience_sampling) | A sampling method that selects easily accessible subjects. Non-response bias in convenience samples is compounded because the initial sample is already non-representative. |
| Survivorship bias | Only "surviving" entities appear in the dataset. This is conceptually similar to non-response bias where non-surviving entities are the non-respondents. |
| Attrition bias | A form of non-response bias specific to longitudinal studies where participants drop out over time. In the causal inference literature, attrition bias and informative censoring share the same underlying causal structure as non-response bias. |
| [Confirmation bias](/wiki/confirmation_bias) | A cognitive bias where researchers seek data confirming their hypotheses. Unlike non-response bias, confirmation bias is a property of the analyst rather than the data. |
| [Prediction bias](/wiki/prediction_bias) | A model-level bias where predictions systematically deviate from true values. Non-response bias in training data is one possible cause of prediction bias. |

## Historical development

The study of non-response bias has evolved significantly over the past century.

- **1936.** The Literary Digest poll failure highlighted the dangers of non-representative sampling and differential non-response, prompting the development of scientific polling methods.
- **1952.** The Horvitz-Thompson estimator formalized inverse probability weighting for unequal selection probabilities, laying groundwork for non-response adjustments.
- **1976.** Donald Rubin published "Inference and Missing Data" in *Biometrika*, establishing the MCAR/MAR/MNAR taxonomy that remains the standard framework for classifying missing data mechanisms.[2]
- **1977.** Dempster, Laird, and Rubin published the EM algorithm paper ("Maximum Likelihood from Incomplete Data via the EM Algorithm" in the *Journal of the Royal Statistical Society*), providing a general iterative method for parameter estimation with incomplete data. It is the second most cited paper in all of statistics.
- **1979.** James Heckman published "Sample Selection Bias as a Specification Error" in *Econometrica*, introducing the Heckman correction for selection bias. He later received the Nobel Prize in Economics for this contribution.[3]
- **1987.** Rubin published *Multiple Imputation for Nonresponse in Surveys*, providing the complete methodological framework and pooling rules for multiple imputation.[4]
- **1988.** Roderick Little proposed the MCAR test, giving researchers a formal diagnostic tool for assessing whether data are missing completely at random.[5]
- **2008.** Robert Groves and Emilia Peytcheva published "The Impact of Nonresponse Rates on Nonresponse Bias: A Meta-Analysis" in Public Opinion Quarterly, showing across 959 estimates from 59 studies that the nonresponse rate is only weakly related to nonresponse bias and is therefore a poor stand-alone indicator of survey quality.[1]
- **2010.** The National Research Council (at the request of the FDA) published "The Prevention and Treatment of Missing Data in Clinical Trials," establishing guidelines for handling non-response in pharmaceutical research.[7]
- **2010s-2020s.** Machine learning researchers adapted classical non-response methods to algorithmic fairness, recommender systems, and causal inference. Inverse propensity scoring became a standard technique for debiasing recommendation models trained on implicit feedback, and rater-pool selection was recognized as a source of non-response-style bias in RLHF.[11][12]

## Software and tools

Several software packages implement methods for detecting and addressing non-response bias and missing data.

| Language/Platform | Package | Functionality |
|-------------------|---------|---------------|
| R | `mice` | Multiple Imputation by Chained Equations; the most widely used MI package |
| R | `Amelia` | Multiple imputation for cross-sectional and time-series data using a bootstrapped EM algorithm |
| R | `naniar` | Missing data visualization and diagnostics, including Little's MCAR test |
| R | `missForest` | Random forest-based single imputation for mixed-type data |
| Python | `scikit-learn` (SimpleImputer, KNNImputer, IterativeImputer) | Mean, median, KNN, and iterative imputation for ML pipelines |
| Python | `fancyimpute` | Matrix completion methods including MICE, KNN, and nuclear norm minimization |
| Python | `missingno` | Missing data visualization (matrix plots, heatmaps, dendrograms) |
| Stata | `mi`, `ice` | Multiple imputation and chained equations |
| SAS | `PROC MI`, `PROC MIANALYZE` | Multiple imputation generation and result pooling |

## Best practices

1. **Report missing data rates.** Always document the proportion and pattern of missing data for each variable. Transparency about non-response is a prerequisite for reproducibility.
2. **Do not treat the response rate as a quality score.** As Groves and Peytcheva showed, the non-response rate is only weakly correlated with bias; pair any rate with evidence about how non-respondents differ from respondents.[1]
3. **Investigate the missing data mechanism.** Use Little's MCAR test, pattern visualization, and domain knowledge to assess whether MCAR, MAR, or MNAR is the most plausible assumption.
4. **Avoid ad hoc deletion.** Listwise deletion is valid only under MCAR. For most real-world data, it introduces bias and wastes information.
5. **Use principled imputation.** [Multiple imputation](/wiki/multiple_imputation) or maximum likelihood methods are preferred over single imputation (mean, median, mode) because they account for the uncertainty introduced by the missing data.
6. **Conduct sensitivity analyses.** Test whether conclusions are robust to plausible departures from the assumed missing data mechanism, especially when MNAR is a possibility.
7. **Design for non-response.** Build data collection systems with non-response mitigation in mind: use incentives, follow-up contacts, shorter instruments, and multiple collection modes.
8. **Monitor subgroup performance.** In ML systems, evaluate model performance separately across demographic and behavioral subgroups to detect whether non-response bias is driving differential accuracy.
9. **Audit deployed models and rater pools.** Continuously monitor production models, and document the composition of human-feedback rater pools, for signs that non-response bias in historical training data is producing unfair or inaccurate outcomes for underserved populations.[12]

## See also

- [Bias (Ethics/Fairness)](/wiki/bias_ethics_fairness)
- [Sampling Bias](/wiki/sampling_bias)
- [Selection Bias](/wiki/selection_bias)
- [Coverage Bias](/wiki/coverage_bias)
- [Convenience Sampling](/wiki/convenience_sampling)
- [Class-Imbalanced Dataset](/wiki/class-imbalanced_dataset)
- [Oversampling](/wiki/oversampling)
- [Data Augmentation](/wiki/data_augmentation)
- [Prediction Bias](/wiki/prediction_bias)
- [Confirmation Bias](/wiki/confirmation_bias)

## References

1. Groves, R. M., & Peytcheva, E. (2008). "The Impact of Nonresponse Rates on Nonresponse Bias: A Meta-Analysis." *Public Opinion Quarterly*, 72(2), 167-189.
2. Rubin, D. B. (1976). "Inference and Missing Data." *Biometrika*, 63(3), 581-590.
3. Heckman, J. J. (1979). "Sample Selection Bias as a Specification Error." *Econometrica*, 47(1), 153-161.
4. Rubin, D. B. (1987). *Multiple Imputation for Nonresponse in Surveys*. John Wiley & Sons.
5. Little, R. J. A. (1988). "A Test of Missing Completely at Random for Multivariate Data with Missing Values." *Journal of the American Statistical Association*, 83(404), 1198-1202.
6. Little, R. J. A., & Rubin, D. B. (2002). *Statistical Analysis with Missing Data* (2nd ed.). John Wiley & Sons.
7. National Research Council. (2010). *The Prevention and Treatment of Missing Data in Clinical Trials*. The National Academies Press.
8. Peytchev, A. (2013). "Consequences of Survey Nonresponse." *The ANNALS of the American Academy of Political and Social Science*, 645(1), 88-111.
9. van Buuren, S. (2018). *Flexible Imputation of Missing Data* (2nd ed.). CRC Press.
10. Cheung, K. L., et al. (2017). "The Impact of Non-Response Bias Due to Sampling in Public Health Studies." *BMC Public Health*, 17, 903.
11. Saito, Y., et al. (2020). "Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback." *Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM)*, 501-509.
12. Gallegos, I. O., et al. (2024). "Bias and Fairness in Large Language Models: A Survey." *Computational Linguistics*, 50(3), 1097-1179.
13. Sterner, W. R. (2011). "What is Missing in Counseling Research? Reporting Missing Data." *Journal of Counseling & Development*, 89(1), 56-62.