# Sampling Bias

> Source: https://aiwiki.ai/wiki/sampling_bias
> Updated: 2026-07-11
> Categories: AI Ethics, Data & Datasets, Machine Learning, Statistics
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Sampling bias** is a systematic error in statistics and [machine learning](/wiki/machine_learning) that occurs when a sample is collected so that some members of the intended population have a higher or lower probability of being included than others, producing a non-representative sample whose conclusions reflect the flawed data-collection method rather than the phenomenon under study. Unlike random [sampling error](/wiki/sampling_error), sampling bias is systematic: it cannot be removed by collecting more data, and a biased sample of ten million observations remains biased. In statistical terms it produces a non-zero bias, $$\text{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta$$, in estimates of population parameters, undermining the validity of any analysis built on the biased data.

The canonical demonstration is the 1936 *Literary Digest* poll, which mailed roughly 10 million questionnaires, received about 2.4 million responses, and still predicted the wrong winner of the U.S. presidential election by a wide margin, while George Gallup correctly called the result using a sample of just 50,000.[6] In [machine learning](/wiki/machine_learning), sampling bias is one of the most common sources of poor model performance in production: a model trained on a biased [dataset](/wiki/dataset) may achieve high [accuracy](/wiki/accuracy) on its [test data](/wiki/test_data) while failing on real-world inputs, because both the training and test sets share the same systematic gaps in coverage.[12] It is a primary driver of [algorithmic bias](/wiki/algorithmic_bias) and a core concern in AI [fairness](/wiki/fairness_metric).

## Explain like I'm 5 (ELI5)

Imagine you want to find out what everyone's favorite ice cream flavor is. But instead of asking kids at every school, you only ask kids at the school next to the chocolate ice cream factory. Most of those kids will probably say chocolate because they smell it every day and get free samples. If you then tell people that chocolate is the world's favorite flavor, you would be wrong, because you only asked a special group of kids who had a reason to like chocolate more.

Sampling bias works the same way. When you only look at information from certain kinds of people (or data points), you miss what everyone else thinks, and your answer ends up lopsided.

## What is sampling bias? Formal definition

In probability theory, let a population consist of *N* individuals, and let a sampling mechanism assign each individual *i* a selection probability *p_i*. A sample is **unbiased** if $$p_i = 1/N$$ for all $$i$$ (simple random sampling) or, more generally, if every individual has a known, nonzero probability of selection (probability sampling). Sampling bias occurs when the actual selection probabilities deviate from the intended ones, meaning that for some individuals *p_i* is systematically too high or too low, or even zero.

If $$\theta$$ is the population parameter of interest (for example, the mean) and $$\hat{\theta}$$ is the estimator calculated from the sample, then the bias is defined as:

$$
\text{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta
$$

When this bias is nonzero and arises from the way the sample was selected rather than from the estimator itself, it constitutes sampling bias. Because this bias is systematic, increasing the sample size *n* does not reduce it. Only changes to the sampling procedure or post-hoc corrections (such as reweighting) can address it.

## How does sampling bias relate to selection bias?

Sampling bias is usually classified as a subtype of [selection bias](/wiki/selection_bias), though the two terms are often used interchangeably. A useful distinction is that sampling bias primarily threatens **external validity** (the ability to generalize results to the full population), while selection bias more broadly addresses **internal validity** (whether differences within the sample reflect genuine effects or artifacts of how participants were chosen). In practice, both concepts overlap considerably, and many sources treat them as synonyms.

## What are the types of sampling bias?

Sampling bias manifests in many forms depending on the mechanism that distorts the sample. The table below summarizes the most widely recognized types.

| Type | Description | Example |
|------|-------------|--------|
| Self-selection bias | Occurs when individuals volunteer to participate, and those who choose to participate differ systematically from those who do not | Online satisfaction surveys tend to attract people with strong opinions (very satisfied or very dissatisfied), while indifferent users rarely respond |
| [Non-response bias](/wiki/non-response_bias) | Arises when people who do not respond to a survey differ from those who do on variables of interest[9] | In the U.S. Census Bureau's American Community Survey during 2020, low-earning households were much less likely to respond, biasing income estimates upward and poverty estimates downward |
| Survivorship bias | Results from focusing only on subjects that "survived" a selection process while ignoring those that did not | Analyzing only currently successful companies to draw business lessons ignores the many companies that tried the same strategies and failed |
| Undercoverage bias | Occurs when certain segments of the population are excluded or underrepresented in the sampling frame | A telephone survey conducted only via landlines misses people who only use mobile phones, typically younger and lower-income individuals |
| Overcoverage bias | Occurs when some population members appear multiple times in the sampling frame, inflating their selection probability | A mailing list with duplicate entries causes certain individuals to receive multiple survey invitations, making them more likely to be counted |
| Convenience bias | Results from selecting participants based on ease of access rather than randomization | A psychology study that recruits only college undergraduates may not generalize to the broader adult population |
| [Reporting bias](/wiki/reporting_bias) | Occurs when certain outcomes are more likely to be published or reported, skewing the available evidence | Medical journals historically published positive drug trial results more often than negative ones, creating a skewed picture of treatment effectiveness |
| Healthy user bias | The study population is systematically healthier than the general population | Studies on occupational health among manual laborers miss workers who left the occupation due to illness, overestimating the health of the remaining workers |
| Berkson's bias (admission rate bias) | A spurious association between diseases observed in hospital-based studies, because having either condition increases the probability of hospitalization | A 1946 study by Joseph Berkson showed that hospital patients without diabetes appeared more likely to have cholecystitis, simply because they needed some reason to be admitted[3] |
| Temporal bias | Data collected during a specific time window does not represent the population across different time periods | Training a fraud detection model exclusively on holiday-season transaction data may cause poor performance during normal spending periods |
| [Participation bias](/wiki/participation_bias) | The act of participating in a study changes the behavior or characteristics of participants | Patients enrolled in clinical trials may receive more attentive care than the general patient population, independent of the treatment being studied |
| Pre-screening bias | How a study is advertised or screened determines who sees and responds to it | An online ad for a health study on a fitness website attracts health-conscious respondents who are not representative of the general population |

## What is survivorship bias?

Survivorship bias deserves special attention because of its pervasiveness and its often counterintuitive nature. The classic example comes from World War II. The U.S. military examined bullet damage on bombers returning from combat missions and initially proposed reinforcing the most heavily damaged areas (fuselage and wings). The mathematician Abraham Wald, working with the Statistical Research Group at Columbia University, recognized that this analysis suffered from survivorship bias. The planes being examined were the ones that **survived**; the bullet holes showed where a plane could take damage and still fly home. The areas with no damage on returning planes were likely the areas where hits proved fatal, because those planes never made it back. Wald recommended reinforcing the undamaged areas instead, and the military adopted his advice.[2]

Survivorship bias appears frequently in everyday reasoning:

- **Business advice**: Books and articles about successful entrepreneurs often highlight traits shared by people like Steve Jobs or Elon Musk without studying the many entrepreneurs who had those same traits and still failed.
- **Investment funds**: Mutual fund performance statistics often exclude funds that were shut down or merged due to poor returns, making the surviving funds' average performance look better than it actually is.
- **Historical artifacts**: Our understanding of prehistoric humans is biased toward cave-dwelling because cave paintings survived, while any art on trees, animal skins, or hillsides has long since decayed. This is sometimes called the "caveman effect."
- **Music and literature**: We remember only the best works from past centuries, leading to the false impression that older art was uniformly superior to modern art.

## What is self-selection bias?

Self-selection bias (also called volunteer bias) is one of the most common forms of sampling bias in research and data collection. It occurs when individuals decide for themselves whether to participate in a study, and the decision to participate is correlated with the variables being studied.

Research has consistently found that volunteers tend to:

- Have higher levels of education
- Come from higher socioeconomic backgrounds
- Be more extroverted and socially active
- Hold stronger opinions on the topic under study
- Be female (women volunteer for studies at higher rates than men in many contexts)

This type of bias is especially problematic in online surveys and phone-in polls, where participation is entirely voluntary. As a result, these instruments tend to produce a "polarization of responses," with extreme perspectives receiving disproportionate weight while moderate views are underrepresented.

In [machine learning](/wiki/machine_learning), self-selection bias appears when user-generated data forms the [training data](/wiki/training_data). For example, product review datasets are dominated by users who feel strongly enough to write a review, while the silent majority of satisfied (but not enthusiastic) customers are absent from the data.

## What is Berkson's bias?

Berkson's bias (also known as Berkson's paradox or admission rate bias) is a form of sampling bias specific to studies conducted within hospitals or clinics. First described by the biostatistician Joseph Berkson in 1946, it occurs because the combination of a disease and an exposure both independently increase the probability of hospital admission.[3] When cases and controls are both drawn from a hospital population, this shared pathway to admission creates a spurious (usually negative) correlation between the disease and the exposure.

For example, suppose a researcher wants to study whether diabetes is associated with cholecystitis (gallbladder disease). If the study recruits both cases and controls from hospital patients, the controls (patients without diabetes) must have been admitted for some other reason, making them more likely to have cholecystitis. This artificially inflates the apparent association between diabetes and cholecystitis.

The solution to Berkson's bias is straightforward in principle: use population-based sampling rather than hospital-based sampling. When every member of the population has an equal chance of being selected, the distortion introduced by differential admission rates disappears.

## What are famous historical examples of sampling bias?

Several well-known historical cases illustrate the consequences of sampling bias.

### The Literary Digest poll of 1936

The *Literary Digest* magazine conducted one of the largest polls in history to predict the outcome of the 1936 U.S. presidential election between Franklin D. Roosevelt and Alf Landon. The magazine mailed roughly 10 million questionnaires and received approximately 2.4 million responses. Based on these responses, the Digest predicted that Landon would win decisively with about 57% of the vote.[6]

Roosevelt won in a landslide with about 61% of the popular vote, carrying 46 of 48 states (losing only Maine and Vermont).

The poll's failure stemmed from two forms of sampling bias. First, the mailing lists were drawn from sources such as telephone directories and automobile registration records. During the Great Depression, these sources over-represented wealthier Americans, who were more likely to favor the Republican candidate. Second, of the roughly 10 million people contacted, only about 2.4 million responded (a response rate near 24%), and Landon supporters were disproportionately motivated to return their ballots, introducing severe nonresponse bias, which subsequent research identified as the primary contributor to the error.[6]

Meanwhile, George Gallup's organization correctly predicted Roosevelt's victory using a carefully selected sample of just 50,000 citizens. This event established two lasting lessons: a large sample does not compensate for a biased sampling method, and how a sample is selected matters far more than how large it is.[6]

### The 1948 "Dewey Defeats Truman" headline

In 1948, the *Chicago Tribune* published its famous "DEWEY DEFEATS TRUMAN" headline based on polls that predicted Thomas Dewey would defeat Harry Truman. The polls suffered from coverage bias because telephones were not yet universal; people who owned phones tended to be wealthier and more likely to vote Republican. Pollsters also stopped surveying too early, missing a late swing in voter sentiment.

### COVID-19 case fatality rate estimates

During the early months of the COVID-19 pandemic, wide variations in testing policies across countries introduced substantial sampling bias into case counts. Countries that tested primarily hospitalized patients reported much higher case fatality rates than countries that conducted broader community testing. These differences were largely artifacts of sampling rather than genuine differences in disease severity. Researchers showed that variations in sampling bias accounted for much of the observed international variation in both case fatality rates and the apparent age distribution of cases.[11]

## How does sampling bias affect machine learning?

Sampling bias is one of the most significant sources of [bias](/wiki/bias) in [machine learning](/wiki/machine_learning) systems.[10] Because ML models learn patterns from their [training data](/wiki/training_data), any systematic gaps or distortions in the data are directly reflected in the model's predictions.[12] In their widely cited 2021 survey of bias and fairness in machine learning, Mehrabi and colleagues define fairness as "the absence of any prejudice or favoritism toward an individual or group based on their inherent or acquired characteristics," a standard that biased sampling routinely violates.[10]

### How sampling bias affects model training

When a [dataset](/wiki/dataset) is not representative of the population the model will encounter in production, several problems arise:

1. **Poor [generalization](/wiki/generalization)**: The model may achieve high performance on its test set (which shares the same bias) but fail on real-world inputs from underrepresented groups.
2. **Systematic discrimination**: If certain demographic groups are underrepresented or absent from the training data, the model may produce less accurate or outright incorrect predictions for those groups.[10]
3. **Feedback loops**: In recommendation systems and ad-targeting, biased training data leads to biased recommendations, which generate biased user interaction data, reinforcing and amplifying the original bias over time.
4. **Miscalibrated confidence**: A model trained on biased data may produce overconfident predictions for underrepresented cases because it has never encountered evidence that would temper its confidence.

### Notable examples in AI

| System or dataset | Type of sampling bias | Consequence |
|---|---|---|
| [ImageNet](/wiki/imagenet) | Geographic and demographic undercoverage | The dataset over-represented lighter-skinned individuals from Western countries, leading to lower [accuracy](/wiki/accuracy) on images of people from other regions. In 2019, after the biases were exposed, ImageNet announced it would remove about 600,000 images from the "person" subtree of its hierarchy. |
| Commercial facial recognition | Demographic undercoverage | A 2018 study by Joy Buolamwini and Timnit Gebru found that gender [classification](/wiki/classification) error rates for darker-skinned women were up to 34.7%, compared to 0.8% for lighter-skinned men, in commercial systems from major vendors.[4] |
| COMPAS recidivism tool | Racial representation bias | ProPublica's 2016 analysis found that the COMPAS system's false positive rate (predicting recidivism when it did not occur) was significantly higher for Black defendants than for white defendants, raising questions about whether the training data reflected existing disparities in the criminal justice system. |
| Healthcare risk prediction | Historical utilization bias | A 2019 study published in *Science* found that a widely used algorithm for predicting healthcare needs assigned systematically lower risk scores to Black patients. The algorithm used healthcare spending as a proxy for health needs, but because Black patients historically had less access to healthcare, their lower spending did not reflect lower medical need. Correcting the proxy would have raised the share of Black patients flagged for extra care from 17.7% to 46.5%.[5] |
| Cardiac MRI segmentation | Demographic undercoverage | A [deep learning](/wiki/deep_learning) model trained on data that was 80% White achieved a Dice Similarity Coefficient of 93.5% for White subjects but only 84.5% for Black and Mixed-race subjects. |

Reflecting on the healthcare result, the study's lead author Ziad Obermeyer and colleagues wrote that the algorithm's reliance on cost as a proxy meant "less money is spent on Black patients who have the same level of need, and the algorithm thus falsely concludes that Black patients are healthier than equally sick White patients."[5]

### Temporal bias and concept drift

Temporal bias is a form of sampling bias where the training data reflects conditions from a specific time period that may not hold in the future. This is closely related to the concept of **concept drift**, where the statistical relationship between input features and output labels changes over time.

Examples include:

- A spam filter trained on 2015-era phishing emails may fail on modern phishing attempts that use more sophisticated language and social engineering.
- A credit scoring model trained during an economic boom may produce unreliable predictions during a recession because spending and repayment patterns shift.
- A recommendation system trained on pre-pandemic user behavior may not reflect the changed preferences that emerged during and after COVID-19 lockdowns.

Dealing with temporal bias typically requires continuous monitoring of model performance and periodic retraining on updated data. Many production ML systems implement automated **data drift detection** to alert engineers when the distribution of incoming data diverges significantly from the training distribution.

## How do you detect sampling bias?

Identifying sampling bias is the first step toward addressing it. Several approaches are used in both traditional statistics and machine learning.

### Statistical methods

- **Comparing sample demographics to population demographics**: If census data or other population-level information is available, researchers can compare the distribution of key variables in their sample (age, gender, income, geography) against known population distributions.
- **Chi-squared tests**: These tests can determine whether observed differences in the distribution of categorical variables across sample subgroups are statistically significant.
- **t-tests and ANOVA**: For continuous variables, these tests compare means across groups to identify whether certain subgroups are over- or under-represented.
- **Nonresponse analysis**: Comparing early respondents to late respondents, or comparing respondents to known characteristics of nonrespondents, can reveal patterns of nonresponse bias.[9]

### Machine learning methods

- **Fairness audits**: Tools like IBM's AI Fairness 360 provide metrics to detect bias in datasets and models, including [disparate impact](/wiki/disparate_impact) ratios, [equalized odds](/wiki/equalized_odds) assessments, and [demographic parity](/wiki/demographic_parity) checks.
- **Subgroup performance analysis**: Evaluating model performance separately for each demographic group or subpopulation can reveal disparities that aggregate metrics would hide. A model with 95% overall accuracy might have 99% accuracy for the majority group and only 70% for a minority group.
- **Distribution comparison**: Visualizing and statistically comparing the distributions of features in the training data against those in the production data can reveal systematic gaps.[13]
- **Counterfactual testing**: Changing protected attributes (such as race or gender) in input data and observing whether model outputs change can reveal sensitivity to variables that should ideally not influence predictions.

## How do you mitigate or correct sampling bias?

Mitigating sampling bias requires intervention at different stages of the data collection and modeling pipeline.

### At the data collection stage

| Technique | Description | When to use |
|-----------|-------------|-------------|
| Simple random sampling | Every member of the population has an equal probability of being selected | When a complete list of the population (sampling frame) is available |
| Stratified sampling | The population is divided into subgroups (strata) based on key characteristics, and random samples are drawn from each stratum in proportion to its population share | When specific subgroups must be adequately represented, such as minority demographic groups |
| Cluster sampling | The population is divided into clusters (often geographic), a random selection of clusters is chosen, and all or some members within those clusters are sampled | When a complete population list is unavailable but clusters can be identified |
| Systematic sampling | Every *k*th member of an ordered population list is selected after a random starting point | When the population can be listed but simple random selection is impractical |
| [Oversampling](/wiki/oversampling) minority groups | Intentionally sampling underrepresented groups at higher rates, then applying weights to produce population-representative estimates | When certain groups are rare in the population but must be well-represented for analysis |

### At the data preprocessing stage

| Technique | Description | Considerations |
|-----------|-------------|----------------|
| Sample reweighting | Assigning weights to each observation so that the weighted sample matches the population distribution. Each sample receives a weight equal to its population proportion divided by its sampling proportion.[13] | Requires knowledge of the true population distribution; can increase variance if weights are extreme |
| [Oversampling](/wiki/oversampling) (random) | Duplicating existing minority class observations to balance class proportions | Risks [overfitting](/wiki/overfitting) because the model sees identical copies of minority examples |
| [Undersampling](/wiki/undersampling) (random) | Removing majority class observations to balance class proportions | Loses potentially useful information from the majority class |
| SMOTE (Synthetic Minority Over-sampling Technique) | Generates synthetic minority class examples by interpolating between existing minority class neighbors rather than duplicating them[7] | Reduces overfitting risk compared to random [oversampling](/wiki/oversampling); should only be applied to [training data](/wiki/training_data), never to validation or test sets |
| Inverse probability weighting (IPW) | Weights each observation by the inverse of its estimated probability of being included in the sample | Widely used in causal inference; requires a correctly specified selection model[8] |
| Propensity score matching | Matches treated and untreated observations with similar estimated probabilities of treatment, creating a pseudo-randomized comparison | Useful in observational studies where randomization is not possible[8] |

### At the modeling stage

- **Cost-sensitive learning**: Assigning higher misclassification costs to underrepresented classes so the model pays more attention to them during training.
- **[Cross-validation](/wiki/cross-validation)**: Using techniques like k-fold [cross-validation](/wiki/cross-validation) helps assess whether model performance is consistent across different subsets of the data, though it does not correct bias in the data itself.
- **Fairness constraints**: Incorporating fairness metrics such as [equalized odds](/wiki/equalized_odds) or [demographic parity](/wiki/demographic_parity) directly into the model's objective function during training.
- **Ensemble methods**: Training multiple models on different subsets or resampled versions of the data and combining their predictions can sometimes reduce the effect of bias present in any single subset.

### At the post-deployment stage

- **Continuous monitoring**: Tracking model performance across demographic groups and over time to detect emerging biases.
- **Data drift detection**: Implementing automated systems that compare incoming data distributions against training data distributions and trigger alerts or retraining when significant shifts occur.
- **Periodic retraining**: Refreshing models with new data on a regular schedule to reduce temporal bias.
- **Human-in-the-loop review**: Having domain experts review model predictions for high-stakes decisions, particularly for cases involving underrepresented populations.

## What is the Heckman correction?

The economist James Heckman developed a two-step statistical method for correcting sample selection bias, work for which he received the Nobel Memorial Prize in Economic Sciences in 2000.[1] The Heckman correction is widely used in [econometrics](/wiki/econometrics) and the social sciences.

The method works in two stages:

1. **Selection equation**: A probit model estimates the probability that each observation is included in the sample, based on observable characteristics.
2. **Outcome equation**: The inverse Mills ratio (derived from the selection equation) is included as an additional variable in the regression of interest. This ratio acts as a control function that accounts for the selection mechanism.

By explicitly modeling the probability of inclusion, the Heckman correction can recover unbiased estimates even from non-randomly selected samples, provided the selection model is correctly specified. The method assumes that the errors in the selection and outcome equations follow a bivariate normal distribution.[1]

## How does sampling bias differ from other types of bias?

Sampling bias is one of several types of bias that can affect research and machine learning. The table below summarizes how it relates to other common biases.

| Type of bias | What it affects | How it differs from sampling bias |
|---|---|---|
| [Selection bias](/wiki/selection_bias) | Internal and external validity | Broader category that includes sampling bias; also covers biases arising from how participants are assigned to groups within a study |
| [Confirmation bias](/wiki/confirmation_bias) | Interpretation of results | A cognitive bias where researchers favor evidence that supports their preexisting beliefs; affects analysis rather than data collection |
| Measurement bias | Data accuracy | Arises from faulty instruments or inconsistent measurement procedures rather than from how subjects are selected |
| [Reporting bias](/wiki/reporting_bias) | Published evidence | Occurs when certain results (usually positive ones) are more likely to be published, regardless of how the sample was collected |
| [Implicit bias](/wiki/implicit_bias) | Data labeling and feature selection | Unconscious preferences that influence which features are collected and how [data labeling](/wiki/data_labeling) is performed |
| [Prediction bias](/wiki/prediction_bias) | Model output calibration | The difference between the average prediction and the average observation in a dataset; may result from sampling bias but can also arise from model architecture |
| [Coverage bias](/wiki/coverage_bias) | Sampling frame completeness | A subtype of sampling bias that occurs when the sampling frame does not match the target population |

## Best practices for avoiding sampling bias

1. **Define the target population precisely** before data collection begins, and verify that the sampling frame covers it adequately.
2. **Use probability sampling methods** whenever possible (simple random, stratified, cluster, or systematic sampling).
3. **Maximize response rates** through follow-up contacts, incentives, and accessible survey design, while recognizing that high response rates alone do not guarantee low bias.[9]
4. **Document the sampling procedure** in detail so that potential biases can be identified and assessed by others.
5. **Compare sample characteristics to population benchmarks** using census data or other reliable sources.
6. **Apply post-stratification weights** when certain groups are known to be underrepresented.
7. **Evaluate model performance across subgroups**, not just in aggregate, to catch disparities hidden by overall metrics.
8. **Use diverse data sources** rather than relying on a single collection method, which may systematically exclude certain populations.
9. **Be transparent about limitations**. Every sample has potential biases; acknowledging them helps consumers of the research interpret results appropriately.
10. **Monitor for temporal drift** in production ML systems and retrain models as the data distribution shifts.

## See also

- [Selection bias](/wiki/selection_bias)
- [Bias](/wiki/bias)
- [Algorithmic bias](/wiki/algorithmic_bias)
- [AI bias](/wiki/ai_bias)
- [Dataset](/wiki/dataset)
- [Class imbalance](/wiki/class_imbalance)
- [Oversampling](/wiki/oversampling)
- [Undersampling](/wiki/undersampling)
- [Data augmentation](/wiki/data_augmentation)
- [Fairness metric](/wiki/fairness_metric)
- [Reporting bias](/wiki/reporting_bias)
- [Coverage bias](/wiki/coverage_bias)
- [Confirmation bias](/wiki/confirmation_bias)
- [Overfitting](/wiki/overfitting)
- [Generalization](/wiki/generalization)
- [Convenience sampling](/wiki/convenience_sampling)
- [Non-response bias](/wiki/non-response_bias)

## References

1. Heckman, J. J. (1979). "Sample Selection Bias as a Specification Error." *Econometrica*, 47(1), 153-161.
2. Wald, A. (1943). "A Method of Estimating Plane Vulnerability Based on Damage of Survivors." Statistical Research Group, Columbia University. Republished in 1980 by the Center for Naval Analyses.
3. Berkson, J. (1946). "Limitations of the Application of Fourfold Table Analysis to Hospital Data." *Biometrics Bulletin*, 2(3), 47-53.
4. Buolamwini, J., & Gebru, T. (2018). "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." *Proceedings of Machine Learning Research*, 81, 1-15.
5. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). "Dissecting racial bias in an algorithm used to manage the health of populations." *Science*, 366(6464), 447-453.
6. Squire, P. (1988). "Why the 1936 Literary Digest Poll Failed." *Public Opinion Quarterly*, 52(1), 125-133.
7. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). "SMOTE: Synthetic Minority Over-sampling Technique." *Journal of Artificial Intelligence Research*, 16, 321-357.
8. Angrist, J. D., & Pischke, J.-S. (2008). *Mostly Harmless Econometrics: An Empiricist's Companion*. Princeton University Press.
9. Groves, R. M. (2006). "Nonresponse Rates and Nonresponse Bias in Household Surveys." *Public Opinion Quarterly*, 70(5), 646-675.
10. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). "A Survey on Bias and Fairness in Machine Learning." *ACM Computing Surveys*, 54(6), 1-35.
11. Griffith, G. J., et al. (2020). "Collider bias undermines our understanding of COVID-19 disease risk and severity." *Nature Communications*, 11(1), 5749.
12. Mangold, L., & Hangartner, D. (2024). "Sampling Bias in Machine Learning Models." *Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society*.
13. Cortes, C., & Mohri, M. (2014). "Domain Adaptation and Sample Bias Correction Theory and Algorithm for Regression." *Theoretical Computer Science*, 519, 103-126.