Participation Bias

Participation bias (also called volunteer bias, self-selection bias, or non-response bias) is a systematic error that arises when the individuals who choose to take part in a study, survey, or data collection effort differ in meaningful ways from those who do not participate. Because the resulting sample is not representative of the target population, any conclusions drawn from it may be distorted. The problem appears across survey research, clinical trials, epidemiology, machine learning dataset construction, and social media analysis.

Participation bias is a subtype of selection bias. While selection bias refers broadly to any process that produces a non-representative sample, participation bias focuses specifically on the distortion introduced by who opts in (or opts out) of providing data.

ELI5 (Explain like I'm 5)

Imagine you want to find out what flavor of ice cream kids in your school like best. You stand by the door at lunchtime and ask everyone who walks past. But the kids who love ice cream are excited to stop and answer, while the kids who prefer fruit or cookies just walk right past you. At the end of the day you might think "everyone loves chocolate ice cream!" but really you only heard from the kids who cared about ice cream in the first place. The kids who didn't stop might have given totally different answers. That is participation bias: the people who show up to answer are not the same as everyone you wanted to learn about.

Historical background

The 1936 Literary Digest poll

The most famous early example of participation bias occurred during the 1936 United States presidential election. The Literary Digest mailed roughly 10 million straw-poll ballots to predict the outcome of the race between incumbent Franklin D. Roosevelt and Republican challenger Alf Landon. About 2.4 million ballots were returned, a response rate of approximately 24%. Based on those responses, the magazine predicted that Landon would win with 57% of the popular vote. Roosevelt went on to win in a landslide, carrying every state except Maine and Vermont and receiving over 60% of the popular vote.

Two distinct problems plagued the poll. First, the mailing lists were drawn from telephone directories, automobile registrations, and club memberships, all of which skewed toward wealthier individuals during the Great Depression (a sampling bias problem). Second, among those who received ballots, people who strongly opposed Roosevelt were more motivated to return them, introducing non-response bias. The failure of the Literary Digest poll, combined with George Gallup's successful prediction using a much smaller but more carefully selected sample of about 50,000 respondents, helped launch the era of modern scientific polling.

Rosenthal and Rosnow's volunteer research

In the 1960s and 1970s, psychologists Robert Rosenthal and Ralph Rosnow conducted an extensive program of research on the characteristics of people who volunteer for experiments. Their 1975 review established that volunteers systematically differ from non-volunteers across a range of traits, including education, socioeconomic status, sociability, need for approval, openness to experience, and authoritarianism. They also found that the direction and magnitude of volunteer bias can depend on the nature of the task: males are more likely than females to volunteer for physically or psychologically stressful studies, while females volunteer at higher rates for non-stressful research. This body of work demonstrated that participation bias is not random noise but a structured, predictable phenomenon.

Types of participation bias

Participation bias takes several forms depending on how and when the non-representativeness enters the data.

Type	Definition	Example
Non-response bias	People selected for a study decline to participate, and those who decline differ systematically from those who respond.	In a workplace satisfaction survey, employees with the heaviest workloads are too busy to complete the questionnaire, making average satisfaction appear higher than it truly is.
Volunteer bias	Individuals who volunteer for research possess characteristics that differ from the broader population.	Volunteers for a clinical trial tend to be healthier, more educated, and of higher socioeconomic status than the average patient with the same condition.
Self-selection bias	Participants choose whether to join a group or program, and this choice correlates with the outcomes being measured.	People who sign up for a weight-loss program may already be more motivated to lose weight, making the program appear more effective than it would be if applied to a random sample.
Attrition bias	Participants who drop out of a longitudinal study differ from those who remain.	In a multi-year drug trial, sicker patients are more likely to withdraw, so the final results overstate the drug's effectiveness.
Healthy volunteer effect	Volunteers in epidemiological and occupational health studies have lower mortality and morbidity than the general population.	Workers in occupational cohort studies often appear healthier than the general population partly because those healthy enough to hold a job are the ones enrolled.

Mechanisms and causes

Several mechanisms drive participation bias.

Motivation and interest

People who have a personal stake in a topic are more likely to respond to surveys or volunteer for studies about it. A person who has strong opinions about gun control, for instance, is more likely to fill out a political questionnaire than someone who is indifferent. This tendency means that respondents are disproportionately drawn from the tails of the opinion distribution, leading to a polarized picture of public sentiment.

Opportunity and capacity

Participation requires time, access, and ability. Busy managers, shift workers, or caregivers may lack the time to complete surveys. In the context of online data collection, the digital divide means that older adults, people in rural areas, and those with lower incomes are less likely to be reached by web-based instruments. Bethlehem (2010) showed that under-coverage and self-selection jointly bias estimates from internet surveys because specific population subgroups have less access to the internet.

Perceived burden and trust

Longer surveys, invasive procedures, and topics perceived as sensitive reduce participation rates among certain groups. Research on AIDS-related surveys found that people who refused participation tended to be older, more religious, less trusting of survey confidentiality, and lower in sexual self-disclosure. The perceived burden or sensitivity of a study filters out particular demographic and attitudinal profiles.

Rosenthal and Rosnow's work showed that volunteers score higher on measures of need for social approval. This means that research samples may overrepresent individuals who want to be seen in a favorable light, which can compound response bias problems where participants also give socially desirable answers.

Participation bias in survey research

The non-response rate paradox

A common assumption in survey methodology is that higher response rates lead to lower bias. Robert M. Groves and Emilia Peytcheva challenged this assumption in a 2008 meta-analysis published in Public Opinion Quarterly. After analyzing 59 methodological studies, they found that the response rate explains only about 11% of the variation in non-response bias estimates. In other words, a survey with a 30% response rate is not necessarily more biased than one with a 70% response rate. What matters is whether the reasons for non-response are correlated with the variables being measured.

A separate meta-analysis of 44 studies confirmed this finding: methods that boosted response rates did not consistently reduce non-response bias. These results mean that researchers cannot rely on response rate alone as a quality indicator; they must investigate the specific patterns of who responds and who does not.

Testing for non-response bias

Several methods exist for detecting and quantifying non-response bias in surveys.

Method	Description
Wave analysis	Compare responses from early responders (first quartile) with those from late responders (fourth quartile). If late responders resemble non-responders, large differences between waves suggest bias.
Known-population comparison	Compare the demographic profile of respondents against known population parameters (e.g., census data). Significant discrepancies indicate that certain groups are over- or under-represented.
Follow-up of non-respondents	Contact a random subsample of non-respondents by phone or in person and compare their answers to those of the original respondents.
Auxiliary variable analysis	When sampling frames contain variables for both respondents and non-respondents (such as administrative records), compare the two groups on those variables.
Propensity modeling	Build a logistic regression or classification model predicting the probability of response using available covariates. Use the estimated propensities to detect or correct for bias.

Correction and weighting methods

Once non-response bias is detected, several statistical techniques can reduce its impact.

Post-stratification weighting adjusts the sample so that the distribution of key demographic variables matches the known population distribution. For example, if young men are underrepresented among respondents, their responses can be up-weighted.

Propensity score adjustment estimates each individual's probability of responding and then weights respondents by the inverse of that probability. Individuals who are unlikely to respond (and therefore underrepresented) receive higher weights. Research has shown that propensity models using as few as 10 well-chosen variables can reduce bias by a factor of four.

Doubly robust estimation combines propensity score weighting with an outcome model. The resulting estimator is consistent if either the propensity model or the outcome model is correctly specified, providing an extra layer of protection against model misspecification.

Calibration weighting uses auxiliary information from external sources (census, administrative records) to constrain the weighted sample totals to match known population totals.

All of these techniques depend on the availability of auxiliary variables that are correlated with both the response mechanism and the survey outcomes. Without such variables, no weighting scheme can fully correct for participation bias.

Participation bias in clinical research

The healthy volunteer effect

In epidemiology, the "healthy volunteer effect" (HVE) refers to the well-documented observation that individuals who volunteer for research studies tend to be healthier than the general population. The UK Biobank, one of the world's largest biomedical cohort studies, provides a striking illustration. A 2024 study published in the International Journal of Epidemiology found that volunteer bias in the UK Biobank was "so severe that unweighted estimates had the opposite sign of the association in the target population" for some variables. For example, older participants in the Biobank reported better health than their peers in the UK Census, a reversal that occurred because healthier older people were more likely to volunteer.

Clinical trial populations

Clinical trials use strict inclusion and exclusion criteria that, combined with the self-selection of volunteers, produce participant pools that differ from the patient populations who will ultimately use the treatments being tested. Trial participants tend to be younger, healthier, more educated, and more adherent to treatment protocols. This mismatch raises questions about external validity: a drug that works well in a trial population of motivated, compliant volunteers may perform differently in the real-world patient population.

Volunteer bias also operates across the timeline of a trial. A study of a probiotic supplementation trial found that as the study progressed, participants from the most economically deprived groups dropped out at higher rates, reducing the diversity of the sample over time.

Berkson's bias

A related form of participation bias in healthcare research is Berkson's bias (also called admission rate bias). This occurs when a study sample is drawn from hospitalized patients rather than the general population. Because hospitalization itself is associated with illness, hospital-based samples are systematically sicker than the population at large, which can create spurious associations between diseases or between exposures and outcomes.

Participation bias in machine learning

Participation bias is one of the most common and consequential sources of error in machine learning systems, because biased training data produces biased models.

Training data collection

Many machine learning datasets are built from data contributed voluntarily by users, scraped from public platforms, or labeled by paid crowdworkers. Each of these collection methods introduces participation bias.

Data source	Participation bias risk
User-contributed data (reviews, ratings, photos)	Users who contribute tend to hold strong opinions (very positive or very negative), be more digitally literate, and skew younger. Moderate viewpoints are underrepresented.
Web-scraped text	Text from the internet overrepresents English-speaking, younger, urban, and higher-income populations. Entire languages and cultural perspectives may be missing from corpora like Common Crawl.
Crowdsourced labels (Amazon Mechanical Turk, etc.)	Crowdworkers are not representative of the global population. Research shows that annotator demographics, political leanings, and personality traits systematically influence labeling decisions, particularly for subjective tasks like hate speech detection.
Clinical or sensor data	Patients who consent to data sharing tend to be healthier and wealthier. Wearable device data comes disproportionately from tech-savvy, health-conscious users.
Benchmark datasets	Even datasets created by government agencies to serve as benchmarks can underrepresent women and people of color, as documented in studies of facial recognition training data.

Facial recognition and demographic bias

The consequences of participation bias in training data are clearly visible in facial recognition systems. A landmark 2018 study by Joy Buolamwini and Timnit Gebru at MIT found that commercially available facial recognition systems had error rates of 0.8% for lighter-skinned males but up to 34.7% for darker-skinned females. The disparity was traced directly to training datasets that consisted of 79% to 89% lighter-skinned subjects. Because darker-skinned individuals, and women in particular, were underrepresented in the data used to train these systems, the algorithms learned to recognize faces that looked like the majority of their training examples and performed poorly on everyone else.

Recommendation systems and feedback loops

In recommendation systems, participation bias creates self-reinforcing feedback loops. Users who rate, click, or purchase items generate the data that trains the recommendation algorithm. But these active users are not representative of all users; they tend to interact more with popular items. The algorithm then learns to recommend popular items more frequently, which generates even more interaction data for those items, further amplifying the bias. This "rich get richer" dynamic reduces aggregate diversity, homogenizes user experience, and has a disproportionate impact on users with niche preferences. Research by Mansoury et al. (2020) demonstrated that this feedback loop amplification effect is generally stronger for users who belong to minority groups on the platform.

Natural language processing

Natural language processing models trained on social media data inherit the participation biases of the platforms they draw from. A 2023 study from Dartmouth College introduced a formal framework for measuring participation bias on social media, defining it as "the skew in the demographics of the participants who opt in to discussions of a topic, compared to the demographics of the underlying social media platform." The researchers found that topic-level participation bias can be substantial and can even reverse the expected demographic composition. For example, while Twitter's overall user base leaned Democratic with roughly equal gender representation, the subset of users who discussed gun control was disproportionately Republican and male. NLP models trained on such topic-specific data would learn patterns that reflect the views of the participating subpopulation rather than the broader public.

Social media data has become a popular source for research in public health, political science, and marketing. However, social media users are not a representative cross-section of any national population. In the United States, more than 80% of adults under 50 use social media, but fewer than half of those aged 65 and older do. Income, education, and geography also correlate with platform usage.

Even within a platform's user base, participation varies enormously by topic. A small number of highly active users generate the bulk of posts and comments on any given subject, while the majority of users are passive consumers who rarely contribute content. This creates a second layer of participation bias on top of the platform's existing demographic skew. Researchers who treat social media posts as a window into public opinion risk confusing the views of a vocal minority with those of the broader population.

The implications extend to sentiment analysis applications, trend detection, and public health surveillance. During the COVID-19 pandemic, studies using Twitter data to track public sentiment found that the demographics of users posting about COVID-related topics did not match the demographics of the populations most affected by the disease, leading to misleading conclusions about attitudes toward vaccination and public health measures.

Impact on research validity

Participation bias threatens both internal validity and external validity of research findings.

Validity type	How participation bias affects it
Internal validity	If participation is correlated with both the independent and dependent variables, the estimated relationship between them will be distorted. For example, if healthier people are more likely to join a trial and also more likely to respond well to treatment, the treatment effect will be overestimated.
External validity	Even if the internal estimates are correct for the sample, they cannot be generalized to the target population if participants differ from non-participants in ways that affect the outcomes.
Statistical power	When certain subgroups are underrepresented, estimates for those subgroups have wider confidence intervals and less reliability.
Reproducibility	If different studies on the same topic attract different volunteer populations, their results may conflict, reducing the apparent reproducibility of findings.

Mitigation strategies

No single technique can eliminate participation bias entirely, but a combination of design and analytical strategies can reduce its impact.

Study design strategies

Probability sampling. Use random selection methods so that every member of the target population has a known, non-zero probability of being included. This remains the gold standard for representative sampling.
Minimize participant burden. Shorter surveys, simpler consent procedures, and less invasive data collection methods increase participation rates across demographic groups.
Multiple contact modes. Combining mail, phone, online, and in-person recruitment captures participants who might be missed by a single mode.
Incentives. Monetary or non-monetary incentives can boost response rates, although they must be designed carefully to avoid attracting only incentive-motivated participants.
Ensure anonymity and confidentiality. Clearly communicating privacy protections reduces the barrier for individuals who might otherwise decline due to concerns about sensitive topics.
Oversampling underrepresented groups. Deliberately recruiting extra participants from groups known to have low response rates can improve representativeness.

Analytical strategies

Weighting adjustments. Post-stratification, raking, and calibration weighting can realign the sample with known population distributions.
Propensity score methods. Inverse probability weighting using estimated response propensities corrects for differential non-response.
Sensitivity analysis. Testing how results change under different assumptions about non-respondents provides bounds on the potential impact of bias.
Multiple imputation. Missing data from non-respondents can be imputed using observed covariates, under the assumption that the data are missing at random conditional on those covariates.
Doubly robust estimation. Combining propensity models with outcome models provides robustness against misspecification of either model alone.

Machine learning-specific strategies

Diverse data sourcing. Collect training data from multiple platforms, geographies, languages, and demographic groups rather than relying on a single convenient source.
Data augmentation. Synthetically increase representation of underrepresented groups through oversampling, SMOTE, or generative methods.
Bias auditing. Systematically evaluate model performance across demographic subgroups to detect disparities that may stem from biased training data.
Active learning. Prioritize labeling for instances where the model is most uncertain or where specific subgroups are underrepresented.
Feedback loop monitoring. In recommendation systems, track diversity metrics over time and intervene when the system converges on a narrow set of items or user profiles.
Annotator diversity. Recruit crowdworkers from a range of backgrounds and use aggregation methods that account for annotator disagreement rather than treating it as noise.

Relationship to other biases

Related bias	Relationship to participation bias
Selection bias	Participation bias is a subtype of selection bias. Selection bias encompasses any mechanism that produces a non-representative sample; participation bias specifically involves decisions by potential participants to opt in or out.
Sampling bias	Sampling bias arises from the method used to draw a sample (e.g., using phone books that exclude people without phones). Participation bias arises after sampling, when selected individuals choose not to respond. The two often co-occur.
Response bias	Response bias refers to inaccurate answers given by participants who do respond (e.g., due to social desirability). Participation bias is about who responds at all. Both can be present in the same study.
Survivorship bias	Survivorship bias occurs when analysis focuses only on subjects that passed a selection process (e.g., successful companies) while ignoring those that did not. It can be viewed as a temporal form of participation bias.
Confirmation bias	Researchers with strong prior beliefs may unintentionally design studies that attract participants who share those beliefs, compounding participation bias with confirmation bias.
Coverage bias	Coverage bias occurs when the sampling frame excludes part of the target population (e.g., an online survey that excludes people without internet). It is distinct from but often co-occurs with participation bias.

Notable examples

| Example | Context | Impact | |---|---| | 1936 Literary Digest poll | The magazine predicted Alf Landon would defeat Franklin Roosevelt based on 2.4 million returned ballots from 10 million sent. | Roosevelt won in a landslide. Non-response bias from motivated anti-Roosevelt respondents and sampling bias from wealthy mailing lists both contributed. | | UK Biobank volunteer bias | This large-scale biomedical study enrolled approximately 500,000 volunteers from the UK. | A 2024 reweighting study found that some unweighted associations had the opposite sign compared to the true population relationship, because healthier, wealthier individuals were more likely to volunteer. | | Facial recognition disparities | Commercial facial recognition systems were trained on datasets that were 79% to 89% lighter-skinned subjects. | Error rates were 0.8% for lighter-skinned males and up to 34.7% for darker-skinned females, with the disparity directly attributable to training data imbalance. | | Twitter gun control discussions | Dartmouth researchers measured demographic participation bias on Twitter by topic. | Although Twitter's user base leans Democratic with near-equal gender split, gun control discussions were dominated by Republican males, making topic-specific analysis unrepresentative. | | Supermarket manager workload survey | Managers with the heaviest workloads were least likely to respond to a survey about workload. | Results underestimated average workload because the busiest managers were systematically absent from the data. |

References

Groves, R. M., & Peytcheva, E. (2008). "The impact of nonresponse rates on nonresponse bias: A meta-analysis." *Public Opinion Quarterly*, 72(2), 167-189.
Rosenthal, R., & Rosnow, R. L. (1975). *The Volunteer Subject*. Wiley.
Squire, P. (1988). "Why the 1936 Literary Digest poll failed." *Public Opinion Quarterly*, 52(1), 125-133.
Bethlehem, J. (2010). "Selection bias in web surveys." *International Statistical Review*, 78(2), 161-188.
Buolamwini, J., & Gebru, T. (2018). "Gender Shades: Intersectional accuracy disparities in commercial gender classification." *Proceedings of the 1st Conference on Fairness, Accountability and Transparency*, 77-91.
Mansoury, M., Abdollahpouri, H., Pechenizkiy, M., Mobasher, B., & Burke, R. (2020). "Feedback loop and bias amplification in recommender systems." *Proceedings of the 29th ACM International Conference on Information and Knowledge Management*, 2145-2148.
Pham, K. H., Ramponi, G., & Gloor, P. A. (2023). "Quantifying participation biases on social media." *EPJ Data Science*, 12, 34.
Fry, A., Littlejohns, T. J., Sudlow, C., et al. (2017). "Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population." *American Journal of Epidemiology*, 186(9), 1026-1034.
Woolley, J. P. (2019). "The healthy volunteer effect." *Journal of Epidemiology and Community Health*, 53(8), 453.
Ganesh, A., & Lerman, K. (2022). "Data and model biases in social media analyses: A case study of COVID-19 tweets." *Proceedings of the International AAAI Conference on Web and Social Media*.
Springer, M., Duleba, T., & Stier, S. (2023). "Correcting sociodemographic selection biases for population prediction from social media." *Social Science Computer Review*, 41(5), 1585-1605.
Sackett, D. L. (1979). "Bias in analytic research." *Journal of Chronic Diseases*, 32(1-2), 51-63.
Google Developers. (2024). "Fairness: Types of bias." *Machine Learning Crash Course*. https://developers.google.com/machine-learning/crash-course/fairness/types-of-bias
Catalogue of Bias Collaboration. (2017). "Volunteer bias." *Catalogue of Bias*. https://catalogofbias.org/biases/volunteer-bias/

ELI5 (Explain like I'm 5)

Historical background

The 1936 Literary Digest poll

Rosenthal and Rosnow's volunteer research

Types of participation bias

Mechanisms and causes

Motivation and interest

Opportunity and capacity

Perceived burden and trust

Social desirability and approval seeking

Participation bias in survey research

The non-response rate paradox

Testing for non-response bias

Correction and weighting methods

Participation bias in clinical research

The healthy volunteer effect

Clinical trial populations

Berkson's bias

Participation bias in machine learning

Training data collection

Facial recognition and demographic bias

Recommendation systems and feedback loops

Natural language processing

Participation bias in social media and online platforms

Impact on research validity

Mitigation strategies

Study design strategies

Analytical strategies

Machine learning-specific strategies

Relationship to other biases

Notable examples

See also

References

Improve this article

Related Articles

ARC-AGI 2

Non-Response Bias

Outlier Detection

Sampling Bias

Selection Bias

AUC-ROC

ELI5 (Explain like I'm 5)

Historical background

The 1936 Literary Digest poll

Rosenthal and Rosnow's volunteer research

Types of participation bias

Mechanisms and causes

Motivation and interest

Opportunity and capacity

Perceived burden and trust

Social desirability and approval seeking

Participation bias in survey research

The non-response rate paradox

Testing for non-response bias

Correction and weighting methods

Participation bias in clinical research

The healthy volunteer effect

Clinical trial populations

Berkson's bias

Participation bias in machine learning

Training data collection

Facial recognition and demographic bias

Recommendation systems and feedback loops

Natural language processing

Participation bias in social media and online platforms

Impact on research validity

Mitigation strategies

Study design strategies

Analytical strategies

Machine learning-specific strategies

Relationship to other biases

Notable examples

See also

References

Related Articles

ARC-AGI 2

Non-Response Bias

Outlier Detection

Sampling Bias

Selection Bias