Sensitive Attribute

A sensitive attribute (also called a protected attribute or protected characteristic) is any feature in a dataset that corresponds to a legally or ethically protected personal trait, such as race, gender, age, religion, disability status, or national origin. In machine learning and artificial intelligence, sensitive attributes are central to the study of algorithmic fairness because models that directly or indirectly rely on these features can produce discriminatory outcomes. The concept bridges law, ethics, and technical practice: anti-discrimination statutes define which attributes are protected, while fairness researchers develop methods to detect and reduce the influence of those attributes on automated decisions.

Explain like I'm 5 (ELI5)

Imagine a teacher is picking students for a school play. The teacher should choose based on how well each student acts, not based on whether the student is a boy or a girl, what color skin they have, or how old they are. A "sensitive attribute" is one of those personal things about a person (like their gender or race) that should not affect decisions about them. When computers make decisions, they sometimes accidentally pay attention to these things, which can make the decision unfair. So engineers have to be really careful to make sure the computer treats everyone fairly, no matter what group they belong to.

Definition and scope

A sensitive attribute is a variable in a dataset that identifies membership in a group protected by law or ethical norms. When an automated system uses such a variable, whether directly or through correlated features, to make decisions about individuals, the resulting outcomes may be discriminatory.

Sensitive attributes differ from ordinary features in two ways. First, anti-discrimination law restricts how they may influence decisions in domains like employment, lending, housing, and criminal justice. Second, the groups they define often have historical experiences of systemic disadvantage, so errors that correlate with group membership can compound existing inequalities.

The term "sensitive attribute" is more common in the AI ethics and fairness literature, while "protected attribute" or "protected characteristic" tends to appear in legal contexts. Both refer to the same concept.

Common categories of sensitive attributes

The following table lists the most frequently cited sensitive attributes in both legal frameworks and machine learning fairness research.

Attribute	Description	Typical legal basis (U.S.)	Notes
Race / Ethnicity	Racial or ethnic group membership	Civil Rights Act of 1964, Title VII	One of the most studied attributes in bias research
Sex / Gender	Biological sex, gender identity, sexual orientation	Title VII, Title IX	Includes pregnancy, gender identity, and intersex status
Age	Chronological age or age group	Age Discrimination in Employment Act (ADEA)	Protects individuals 40 and older in employment
Disability	Physical or mental impairment	Americans with Disabilities Act (ADA)	Includes both visible and invisible disabilities
Religion	Religious belief or practice	Title VII, First Amendment	Includes lack of religious belief
National origin	Country of birth or ancestry	Title VII	Often correlated with language and immigration status
Skin color	Complexion independent of racial classification	Title VII	Legally distinct from race
Marital / Family status	Marriage, partnership, or parental status	Fair Housing Act	Relevant in housing and lending decisions
Genetic information	DNA, family medical history	Genetic Information Nondiscrimination Act (GINA)	Added to federal protection in 2008
Veteran status	Military service history	Vietnam Era Veterans' Readjustment Assistance Act (VEVRAA)	Relevant in federal contracting and employment
Socioeconomic status	Income level, education, class	Varies by jurisdiction	Not uniformly protected under federal law but frequently studied

Legal frameworks governing sensitive attributes

Multiple legal regimes around the world restrict how sensitive attributes may be used in decision-making. These laws predate AI systems but apply to algorithmic decisions just as they apply to human decisions.

United States

U.S. anti-discrimination law is organized around specific protected classes rather than a single omnibus statute.

Title VII of the Civil Rights Act of 1964 prohibits employment discrimination based on race, color, religion, sex, and national origin. It applies to hiring, firing, compensation, and terms of employment.
Age Discrimination in Employment Act (ADEA) of 1967 protects employees and applicants aged 40 and older.
Americans with Disabilities Act (ADA) of 1990 prohibits discrimination based on physical or mental disability in employment, public accommodations, and government services.
Equal Credit Opportunity Act (ECOA) prohibits lenders from discriminating on the basis of race, color, religion, national origin, sex, marital status, age, or receipt of public assistance.
Fair Housing Act bars discrimination in housing transactions based on race, color, national origin, religion, sex, familial status, and disability.
Genetic Information Nondiscrimination Act (GINA) of 2008 protects against discrimination based on genetic information in employment and health insurance.

The Equal Employment Opportunity Commission (EEOC) has published guidance stating that employers using AI tools in hiring remain liable for discriminatory outcomes, even when a third-party vendor supplies the algorithm. The four-fifths rule (or 80% rule) serves as a common threshold: if the selection rate for a protected group falls below 80% of the rate for the highest-performing group, the practice is considered to have a disparate impact.

European Union

The EU takes a different approach, combining data protection regulation with AI-specific rules.

General Data Protection Regulation (GDPR), Article 9, classifies racial or ethnic origin, political opinions, religious beliefs, trade union membership, genetic data, biometric data, health data, and data concerning sex life or sexual orientation as "special categories of personal data." Processing these categories is generally prohibited unless specific conditions are met, such as explicit consent or substantial public interest.
EU AI Act, adopted in 2024, classifies AI systems used in employment, credit, law enforcement, migration, and education as high-risk. High-risk systems must undergo conformity assessments that include bias testing across protected groups. The Act also bans AI systems that exploit vulnerabilities related to age, disability, or socioeconomic circumstances to manipulate behavior. Biometric categorization systems that infer sensitive attributes such as race, political opinions, or sexual orientation are prohibited.

Other jurisdictions

Canada's Canadian Human Rights Act protects attributes including race, national or ethnic origin, sex, sexual orientation, age, marital status, family status, disability, and pardoned criminal convictions. The United Kingdom's Equality Act 2010 identifies nine protected characteristics: age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, and sexual orientation. Australia's anti-discrimination laws protect race, sex, disability, and age at the federal level, with additional protections at the state level.

Proxy variables and indirect discrimination

Removing a sensitive attribute from a dataset does not guarantee that a model will produce fair outcomes. Other features in the dataset may be statistically correlated with the removed attribute and serve as proxy variables, allowing the model to reconstruct the protected information indirectly.

How proxy variables arise

Proxy variables emerge from the structure of social and economic life. Residential segregation means that zip codes correlate strongly with race. Occupation and industry correlate with gender. First names and surnames correlate with ethnicity. Educational institution names can signal socioeconomic background, gender (in the case of single-sex colleges), or religious affiliation.

Deep learning models are especially adept at discovering subtle proxy relationships because they can learn complex nonlinear interactions among features. A model may combine several individually weak proxies to reconstruct a sensitive attribute with high accuracy, even when no single feature is obviously discriminatory.

Historical examples of proxy discrimination

Redlining in lending. The term "redlining" originally referred to the practice of drawing red lines on maps around predominantly Black neighborhoods and refusing mortgage loans to residents within those areas. Although explicit racial redlining was outlawed by the Fair Housing Act of 1968, modern algorithmic lending systems can reproduce similar patterns by weighting zip codes, neighborhood characteristics, and local property values, all of which correlate with race. The Consumer Financial Protection Bureau (CFPB) has taken enforcement actions against lenders whose algorithmic systems produced disparate impacts through geographic proxies.

Amazon's hiring tool. In 2018, Reuters reported that Amazon had developed an AI-powered resume screening tool trained on ten years of hiring data. Because the technology industry workforce skews heavily male, the model learned to penalize resumes containing the word "women's" (as in "women's rugby team") and the names of women's colleges. Although gender was not an explicit input, textual proxies encoded gender information. Amazon abandoned the tool after repeated attempts to correct the bias failed.

Healthcare resource allocation. A widely cited 2019 study by Obermeyer et al., published in Science, found that a healthcare algorithm used by major U.S. hospitals to allocate care management resources was systematically biased against Black patients. The algorithm used healthcare spending as a proxy for health needs. Because Black patients historically faced barriers to accessing care and thus had lower spending levels, the algorithm assigned them lower risk scores than equally sick white patients. At a given risk score, Black patients were considerably sicker than white patients.

Kidney function estimation. For decades, the estimated glomerular filtration rate (eGFR) formula used in clinical medicine included a race-based correction factor that reported higher kidney function values for Black patients. This made Black patients appear healthier and delayed their access to specialist referrals and transplant waitlists. In 2021, the National Kidney Foundation and the American Society of Nephrology recommended removing race from the eGFR calculation. Studies have shown that this change made approximately 300,000 additional Black Americans eligible for nephrology referral and roughly 31,000 more eligible for transplant evaluation.

The insufficiency of attribute removal

The approach of simply deleting sensitive attributes from a dataset before model training is known as fairness through unawareness. Research has repeatedly demonstrated that this approach is insufficient for several reasons:

Proxy reconstruction. Models can infer the removed attribute from remaining features. A logistic regression model trained without a race variable but with zip code, income, and education features may still produce race-correlated predictions.
Label bias. If the target variable itself reflects historical discrimination (for example, past hiring decisions that favored men), removing the sensitive attribute does not remove the bias embedded in the labels.
Measurement bias. Different groups may be measured differently. If health outcomes data is collected more thoroughly for some groups than others, excluding race will not fix the underlying data quality disparity.
Accuracy trade-offs. Removing a sensitive attribute can reduce model accuracy for the very group the removal was intended to protect, because the attribute may carry legitimate predictive information (for example, sex is medically relevant in certain diagnostic models).

Fairness definitions involving sensitive attributes

Researchers have proposed numerous formal definitions of fairness, each specifying a different mathematical relationship between model predictions and sensitive attributes. These definitions often conflict with one another, and satisfying one may violate another.

Group fairness

Group fairness criteria compare aggregate outcomes across groups defined by a sensitive attribute.

Criterion	Also known as	Mathematical condition	Interpretation
Demographic parity	Statistical parity, independence	P(Y_hat = 1 \| A = a) = P(Y_hat = 1 \| A = b)	Positive prediction rates are equal across groups
Equalized odds	Separation	P(Y_hat = 1 \| Y = y, A = a) = P(Y_hat = 1 \| Y = y, A = b) for all y	True positive and false positive rates are equal across groups
Equal opportunity	-	P(Y_hat = 1 \| Y = 1, A = a) = P(Y_hat = 1 \| Y = 1, A = b)	True positive rates are equal across groups
Predictive parity	Sufficiency	P(Y = 1 \| Y_hat = 1, A = a) = P(Y = 1 \| Y_hat = 1, A = b)	Positive predictive values are equal across groups
Calibration	-	P(Y = 1 \| S = s, A = a) = P(Y = 1 \| S = s, A = b) for all s	At each score level, actual positive rates are equal across groups

A foundational result in the fairness literature, sometimes called the impossibility theorem, establishes that demographic parity, equalized odds, and calibration cannot all be satisfied simultaneously unless the base rates of the positive outcome are identical across groups. In practice, base rates almost always differ, so practitioners must choose which fairness criterion best fits their application.

Individual fairness

Individual fairness requires that similar individuals receive similar predictions, regardless of their group membership. The idea was formalized by Dwork et al. (2012), who proposed that a fair classifier should be a Lipschitz mapping: individuals who are "close" in a task-relevant similarity metric should receive predictions that are "close" as well. The challenge lies in defining the similarity metric, which is itself a value-laden choice.

Counterfactual fairness

Counterfactual fairness, introduced by Kusner et al. (2017), uses causal reasoning to define fairness at the individual level. A prediction is counterfactually fair if it would remain the same in a hypothetical world where the individual's sensitive attribute had been different, but everything else that is not causally downstream of the sensitive attribute had remained the same. Implementing counterfactual fairness requires specifying a causal model (typically a directed acyclic graph) that captures how the sensitive attribute influences other variables.

Intersectionality and multiple sensitive attributes

Real-world individuals belong to multiple protected groups simultaneously. A person may be, for example, a Black woman, an elderly person with a disability, or a young immigrant. The concept of intersectionality, introduced by legal scholar Kimberle Crenshaw in 1989, highlights that discrimination experienced at the intersection of multiple identities can differ qualitatively from discrimination along any single axis.

In machine learning, ensuring fairness for each sensitive attribute independently does not guarantee fairness for intersectional subgroups. This problem is known as fairness gerrymandering: a model can appear fair with respect to race and fair with respect to gender while still discriminating against a specific race-gender subgroup (for example, Black women).

Technical approaches to intersectional fairness include:

Subgroup fairness, which extends group fairness metrics to all intersections of protected attributes. The number of subgroups grows exponentially with the number of attributes, creating statistical challenges when subgroups are small.
Differential fairness, proposed by Foulds et al. (2020), which bounds the log-ratio of outcome probabilities across all intersectional subgroups.
Information-theoretic approaches, which measure the mutual information between model outputs and the joint distribution of multiple sensitive attributes.

A practical tension exists between the goals of intersectionality and statistical reliability. As subgroups become smaller, estimates of fairness metrics become noisier, making it harder to detect and correct bias for the most marginalized groups.

Bias mitigation techniques

Methods for reducing the influence of sensitive attributes on model outcomes are typically organized by when in the machine learning pipeline they are applied.

Pre-processing

Pre-processing techniques modify the training data before it is fed to a model.

Technique	Description
Resampling	Oversampling underrepresented groups or undersampling overrepresented groups to balance the dataset
Reweighting	Assigning higher weights to training examples from disadvantaged groups so that the model pays more attention to them
Label correction	Identifying and correcting labels that reflect historical bias rather than ground truth
Fair representation learning	Transforming features into a latent space that retains predictive information while removing information about the sensitive attribute
Suppression	Removing the sensitive attribute and its known proxies from the dataset (limited effectiveness due to unknown proxies)

In-processing

In-processing techniques incorporate fairness constraints into the model training process itself.

Technique	Description
Constrained optimization	Adding fairness constraints (such as equal false positive rates across groups) to the loss function during training
Adversarial debiasing	Training two models simultaneously: a primary model that predicts the target outcome and an adversary that tries to predict the sensitive attribute from the primary model's outputs. The primary model is penalized when the adversary succeeds.
Regularization-based methods	Adding a penalty term to the loss function that measures the correlation between predictions and the sensitive attribute
Fair classification algorithms	Algorithms designed from the ground up to satisfy specific fairness constraints, such as the reduction approach by Agarwal et al. (2018)

Post-processing

Post-processing techniques adjust a trained model's outputs to improve fairness without modifying the model itself.

Technique	Description
Threshold adjustment	Setting different classification thresholds for different groups to equalize error rates
Equalized odds post-processing	Adjusting predictions using the method of Hardt et al. (2016) to satisfy equalized odds by finding optimal group-specific thresholds on the ROC curve
Calibration	Adjusting predicted probabilities so that they are well-calibrated within each group
Reject option classification	Giving favorable outcomes to disadvantaged group members and unfavorable outcomes to advantaged group members in cases where the model is uncertain (near the decision boundary)

Each stage involves trade-offs. Pre-processing methods are model-agnostic but may discard useful information. In-processing methods can tightly integrate fairness into training but require modifications to standard algorithms. Post-processing methods are easy to apply but treat the model as a black box and may not address root causes of unfairness.

Notable case studies

COMPAS recidivism prediction

The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) system, developed by Northpointe (now Equivant), is a risk assessment tool used in the U.S. criminal justice system to predict the likelihood that a defendant will reoffend. In 2016, ProPublica published an influential analysis of COMPAS scores for over 10,000 defendants in Broward County, Florida. The investigation found that Black defendants were nearly twice as likely as white defendants to be incorrectly classified as high risk (false positives), while white defendants were more likely to be incorrectly classified as low risk (false negatives).

Northpointe responded by pointing out that the overall accuracy rate (approximately 60%) was the same for both racial groups and that the tool satisfied predictive parity. This exchange highlighted a fundamental insight: when base rates differ between groups, it is mathematically impossible to simultaneously achieve equal false positive rates, equal false negative rates, and equal predictive values. The COMPAS debate became a defining example of the impossibility theorem in algorithmic fairness.

Apple Card credit limits

In 2019, multiple applicants reported that the Apple Card, issued by Goldman Sachs, offered significantly higher credit limits to men than to women, even when the women had higher credit scores and shared finances with their male partners. The New York State Department of Financial Services launched an investigation. Goldman Sachs stated that the algorithm did not use gender as an input. The case illustrated how gender-neutral inputs can still produce gender-correlated outputs through proxy variables and historical patterns in credit data.

Fairness toolkits and software

Several open-source toolkits help practitioners detect and mitigate bias related to sensitive attributes.

Toolkit	Developer	Key features
AI Fairness 360 (AIF360)	IBM Research	Over 70 fairness metrics, 12 bias mitigation algorithms covering pre-processing, in-processing, and post-processing stages
Fairlearn	Microsoft (now community-driven)	Fairness assessment dashboards, constraint-based mitigation algorithms, integration with scikit-learn
What-If Tool	Google	Visual interface for exploring model performance across groups, counterfactual analysis, no code required
Aequitas	University of Chicago	Bias audit toolkit for classification models, generates group-level fairness reports
Themis-ML	-	Fairness-aware machine learning library with discrimination-aware regularization and relabeling

These tools typically require users to specify which features are sensitive attributes, the groups to compare, and the fairness metrics to evaluate. They then produce reports showing how model performance and outcomes differ across groups.

Open challenges

Several unresolved problems make the handling of sensitive attributes an active area of research.

Defining group boundaries. Sensitive attributes like race and gender are socially constructed categories with fluid and context-dependent boundaries. Forcing individuals into discrete categories can erase within-group diversity and the experiences of multiracial, nonbinary, or otherwise non-conforming individuals.

Collecting sensitive attribute data. Measuring fairness requires knowing group membership, but collecting sensitive attribute data raises privacy concerns and may itself be restricted by law (for example, under the GDPR). This creates a paradox: you need the data to audit for bias, but collecting the data may violate data protection rules.

Unknown proxies. Even when known proxies are identified and removed, previously unrecognized proxies may remain. In high-dimensional datasets, the number of potential proxy combinations is vast, making exhaustive auditing impractical.

Dynamic and context-dependent fairness. What counts as a sensitive attribute can vary by jurisdiction, domain, and time. An attribute that is protected in employment decisions may not be protected in insurance underwriting. Fairness requirements may also evolve as social norms change.

Tension between fairness and accuracy. Enforcing fairness constraints often reduces overall model accuracy. The magnitude of this trade-off depends on the specific fairness definition, the degree of existing bias, and the structure of the data. Research by Corbett-Davies and Goel (2018) has argued that some fairness constraints can be imposed with minimal accuracy loss, while others require significant sacrifices.

Global variation in protected classes. Different countries protect different attributes and define discrimination differently. A model deployed internationally must navigate a patchwork of legal requirements that may conflict with one another.

References

Barocas, S. and Selbst, A.D. (2016). "Big Data's Disparate Impact." *California Law Review*, 104(3), 671-732.
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. (2012). "Fairness through Awareness." *Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS)*, 214-226.
Kusner, M.J., Loftus, J., Russell, C., and Silva, R. (2017). "Counterfactual Fairness." *Advances in Neural Information Processing Systems 30 (NeurIPS)*, 4066-4076.
Hardt, M., Price, E., and Srebro, N. (2016). "Equality of Opportunity in Supervised Learning." *Advances in Neural Information Processing Systems 29 (NeurIPS)*, 3315-3323.
Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." *Big Data*, 5(2), 153-163.
Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). "Machine Bias." *ProPublica*. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). "Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations." *Science*, 366(6464), 447-453.
Dastin, J. (2018). "Amazon Scraps Secret AI Recruiting Tool That Showed Bias Against Women." *Reuters*. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G
Crenshaw, K. (1989). "Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics." *University of Chicago Legal Forum*, 1989(1), Article 8.
Corbett-Davies, S. and Goel, S. (2018). "The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning." *arXiv preprint arXiv:1808.00023*.
Agarwal, A., Beygelzimer, A., Dudik, M., Langford, J., and Wallach, H. (2018). "A Reductions Approach to Fair Classification." *Proceedings of the 35th International Conference on Machine Learning (ICML)*, 60-69.
Foulds, J.R., Islam, R., Keya, K.N., and Pan, S. (2020). "An Intersectional Definition of Fairness." *Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE)*, 1918-1921.
European Parliament and Council of the European Union (2024). "Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act)." *Official Journal of the European Union*.
Bellamy, R.K.E., et al. (2019). "AI Fairness 360: An Extensible Toolkit for Detecting and Mitigating Algorithmic Bias." *IBM Journal of Research and Development*, 63(4/5), 4:1-4:15.

Explain like I'm 5 (ELI5)

Definition and scope

Common categories of sensitive attributes

Legal frameworks governing sensitive attributes

United States

European Union

Other jurisdictions

Proxy variables and indirect discrimination

How proxy variables arise

Historical examples of proxy discrimination

The insufficiency of attribute removal

Fairness definitions involving sensitive attributes

Group fairness

Individual fairness

Counterfactual fairness

Intersectionality and multiple sensitive attributes

Bias mitigation techniques

Pre-processing

In-processing

Post-processing

Notable case studies

COMPAS recidivism prediction

Apple Card credit limits

Fairness toolkits and software

Open challenges

See also

References

Improve this article

Related Articles

ARC-AGI 2

Disparate Impact

Disparate Treatment

Equalized Odds

Fairness Metric

Individual Fairness

Explain like I'm 5 (ELI5)

Definition and scope

Common categories of sensitive attributes

Legal frameworks governing sensitive attributes

United States

European Union

Other jurisdictions

Proxy variables and indirect discrimination

How proxy variables arise

Historical examples of proxy discrimination

The insufficiency of attribute removal

Fairness definitions involving sensitive attributes

Group fairness

Individual fairness

Counterfactual fairness

Intersectionality and multiple sensitive attributes

Bias mitigation techniques

Pre-processing

In-processing

Post-processing

Notable case studies

COMPAS recidivism prediction

Apple Card credit limits

Fairness toolkits and software

Open challenges

See also

References

Related Articles

ARC-AGI 2

Disparate Impact

Disparate Treatment

Equalized Odds

Fairness Metric

Individual Fairness