A sensitive attribute (also called a protected attribute or protected characteristic) is any feature in a dataset that corresponds to a legally or ethically protected personal trait, such as race, gender, age, religion, disability status, or national origin. In machine learning and artificial intelligence, sensitive attributes are central to the study of algorithmic fairness because models that directly or indirectly rely on these features can produce discriminatory outcomes. The concept bridges law, ethics, and technical practice: anti-discrimination statutes define which attributes are protected, while fairness researchers develop methods to detect and reduce the influence of those attributes on automated decisions.
Imagine a teacher is picking students for a school play. The teacher should choose based on how well each student acts, not based on whether the student is a boy or a girl, what color skin they have, or how old they are. A "sensitive attribute" is one of those personal things about a person (like their gender or race) that should not affect decisions about them. When computers make decisions, they sometimes accidentally pay attention to these things, which can make the decision unfair. So engineers have to be really careful to make sure the computer treats everyone fairly, no matter what group they belong to.
A sensitive attribute is a variable in a dataset that identifies membership in a group protected by law or ethical norms. When an automated system uses such a variable, whether directly or through correlated features, to make decisions about individuals, the resulting outcomes may be discriminatory.
Sensitive attributes differ from ordinary features in two ways. First, anti-discrimination law restricts how they may influence decisions in domains like employment, lending, housing, and criminal justice. Second, the groups they define often have historical experiences of systemic disadvantage, so errors that correlate with group membership can compound existing inequalities.
The term "sensitive attribute" is more common in the AI ethics and fairness literature, while "protected attribute" or "protected characteristic" tends to appear in legal contexts. Both refer to the same concept.
The following table lists the most frequently cited sensitive attributes in both legal frameworks and machine learning fairness research.
| Attribute | Description | Typical legal basis (U.S.) | Notes |
|---|---|---|---|
| Race / Ethnicity | Racial or ethnic group membership | Civil Rights Act of 1964, Title VII | One of the most studied attributes in bias research |
| Sex / Gender | Biological sex, gender identity, sexual orientation | Title VII, Title IX | Includes pregnancy, gender identity, and intersex status |
| Age | Chronological age or age group | Age Discrimination in Employment Act (ADEA) | Protects individuals 40 and older in employment |
| Disability | Physical or mental impairment | Americans with Disabilities Act (ADA) | Includes both visible and invisible disabilities |
| Religion | Religious belief or practice | Title VII, First Amendment | Includes lack of religious belief |
| National origin | Country of birth or ancestry | Title VII | Often correlated with language and immigration status |
| Skin color | Complexion independent of racial classification | Title VII | Legally distinct from race |
| Marital / Family status | Marriage, partnership, or parental status | Fair Housing Act | Relevant in housing and lending decisions |
| Genetic information | DNA, family medical history | Genetic Information Nondiscrimination Act (GINA) | Added to federal protection in 2008 |
| Veteran status | Military service history | Vietnam Era Veterans' Readjustment Assistance Act (VEVRAA) | Relevant in federal contracting and employment |
| Socioeconomic status | Income level, education, class | Varies by jurisdiction | Not uniformly protected under federal law but frequently studied |
Multiple legal regimes around the world restrict how sensitive attributes may be used in decision-making. These laws predate AI systems but apply to algorithmic decisions just as they apply to human decisions.
U.S. anti-discrimination law is organized around specific protected classes rather than a single omnibus statute.
The Equal Employment Opportunity Commission (EEOC) has published guidance stating that employers using AI tools in hiring remain liable for discriminatory outcomes, even when a third-party vendor supplies the algorithm. The four-fifths rule (or 80% rule) serves as a common threshold: if the selection rate for a protected group falls below 80% of the rate for the highest-performing group, the practice is considered to have a disparate impact.
The EU takes a different approach, combining data protection regulation with AI-specific rules.
Canada's Canadian Human Rights Act protects attributes including race, national or ethnic origin, sex, sexual orientation, age, marital status, family status, disability, and pardoned criminal convictions. The United Kingdom's Equality Act 2010 identifies nine protected characteristics: age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, and sexual orientation. Australia's anti-discrimination laws protect race, sex, disability, and age at the federal level, with additional protections at the state level.
Removing a sensitive attribute from a dataset does not guarantee that a model will produce fair outcomes. Other features in the dataset may be statistically correlated with the removed attribute and serve as proxy variables, allowing the model to reconstruct the protected information indirectly.
Proxy variables emerge from the structure of social and economic life. Residential segregation means that zip codes correlate strongly with race. Occupation and industry correlate with gender. First names and surnames correlate with ethnicity. Educational institution names can signal socioeconomic background, gender (in the case of single-sex colleges), or religious affiliation.
Deep learning models are especially adept at discovering subtle proxy relationships because they can learn complex nonlinear interactions among features. A model may combine several individually weak proxies to reconstruct a sensitive attribute with high accuracy, even when no single feature is obviously discriminatory.
Redlining in lending. The term "redlining" originally referred to the practice of drawing red lines on maps around predominantly Black neighborhoods and refusing mortgage loans to residents within those areas. Although explicit racial redlining was outlawed by the Fair Housing Act of 1968, modern algorithmic lending systems can reproduce similar patterns by weighting zip codes, neighborhood characteristics, and local property values, all of which correlate with race. The Consumer Financial Protection Bureau (CFPB) has taken enforcement actions against lenders whose algorithmic systems produced disparate impacts through geographic proxies.
Amazon's hiring tool. In 2018, Reuters reported that Amazon had developed an AI-powered resume screening tool trained on ten years of hiring data. Because the technology industry workforce skews heavily male, the model learned to penalize resumes containing the word "women's" (as in "women's rugby team") and the names of women's colleges. Although gender was not an explicit input, textual proxies encoded gender information. Amazon abandoned the tool after repeated attempts to correct the bias failed.
Healthcare resource allocation. A widely cited 2019 study by Obermeyer et al., published in Science, found that a healthcare algorithm used by major U.S. hospitals to allocate care management resources was systematically biased against Black patients. The algorithm used healthcare spending as a proxy for health needs. Because Black patients historically faced barriers to accessing care and thus had lower spending levels, the algorithm assigned them lower risk scores than equally sick white patients. At a given risk score, Black patients were considerably sicker than white patients.
Kidney function estimation. For decades, the estimated glomerular filtration rate (eGFR) formula used in clinical medicine included a race-based correction factor that reported higher kidney function values for Black patients. This made Black patients appear healthier and delayed their access to specialist referrals and transplant waitlists. In 2021, the National Kidney Foundation and the American Society of Nephrology recommended removing race from the eGFR calculation. Studies have shown that this change made approximately 300,000 additional Black Americans eligible for nephrology referral and roughly 31,000 more eligible for transplant evaluation.
The approach of simply deleting sensitive attributes from a dataset before model training is known as fairness through unawareness. Research has repeatedly demonstrated that this approach is insufficient for several reasons:
Researchers have proposed numerous formal definitions of fairness, each specifying a different mathematical relationship between model predictions and sensitive attributes. These definitions often conflict with one another, and satisfying one may violate another.
Group fairness criteria compare aggregate outcomes across groups defined by a sensitive attribute.
| Criterion | Also known as | Mathematical condition | Interpretation |
|---|---|---|---|
| Demographic parity | Statistical parity, independence | P(Y_hat = 1 | A = a) = P(Y_hat = 1 | A = b) | Positive prediction rates are equal across groups |
| Equalized odds | Separation | P(Y_hat = 1 | Y = y, A = a) = P(Y_hat = 1 | Y = y, A = b) for all y | True positive and false positive rates are equal across groups |
| Equal opportunity | - | P(Y_hat = 1 | Y = 1, A = a) = P(Y_hat = 1 | Y = 1, A = b) | True positive rates are equal across groups |
| Predictive parity | Sufficiency | P(Y = 1 | Y_hat = 1, A = a) = P(Y = 1 | Y_hat = 1, A = b) | Positive predictive values are equal across groups |
| Calibration | - | P(Y = 1 | S = s, A = a) = P(Y = 1 | S = s, A = b) for all s | At each score level, actual positive rates are equal across groups |
A foundational result in the fairness literature, sometimes called the impossibility theorem, establishes that demographic parity, equalized odds, and calibration cannot all be satisfied simultaneously unless the base rates of the positive outcome are identical across groups. In practice, base rates almost always differ, so practitioners must choose which fairness criterion best fits their application.
Individual fairness requires that similar individuals receive similar predictions, regardless of their group membership. The idea was formalized by Dwork et al. (2012), who proposed that a fair classifier should be a Lipschitz mapping: individuals who are "close" in a task-relevant similarity metric should receive predictions that are "close" as well. The challenge lies in defining the similarity metric, which is itself a value-laden choice.
Counterfactual fairness, introduced by Kusner et al. (2017), uses causal reasoning to define fairness at the individual level. A prediction is counterfactually fair if it would remain the same in a hypothetical world where the individual's sensitive attribute had been different, but everything else that is not causally downstream of the sensitive attribute had remained the same. Implementing counterfactual fairness requires specifying a causal model (typically a directed acyclic graph) that captures how the sensitive attribute influences other variables.
Real-world individuals belong to multiple protected groups simultaneously. A person may be, for example, a Black woman, an elderly person with a disability, or a young immigrant. The concept of intersectionality, introduced by legal scholar Kimberle Crenshaw in 1989, highlights that discrimination experienced at the intersection of multiple identities can differ qualitatively from discrimination along any single axis.
In machine learning, ensuring fairness for each sensitive attribute independently does not guarantee fairness for intersectional subgroups. This problem is known as fairness gerrymandering: a model can appear fair with respect to race and fair with respect to gender while still discriminating against a specific race-gender subgroup (for example, Black women).
Technical approaches to intersectional fairness include:
A practical tension exists between the goals of intersectionality and statistical reliability. As subgroups become smaller, estimates of fairness metrics become noisier, making it harder to detect and correct bias for the most marginalized groups.
Methods for reducing the influence of sensitive attributes on model outcomes are typically organized by when in the machine learning pipeline they are applied.
Pre-processing techniques modify the training data before it is fed to a model.
| Technique | Description |
|---|---|
| Resampling | Oversampling underrepresented groups or undersampling overrepresented groups to balance the dataset |
| Reweighting | Assigning higher weights to training examples from disadvantaged groups so that the model pays more attention to them |
| Label correction | Identifying and correcting labels that reflect historical bias rather than ground truth |
| Fair representation learning | Transforming features into a latent space that retains predictive information while removing information about the sensitive attribute |
| Suppression | Removing the sensitive attribute and its known proxies from the dataset (limited effectiveness due to unknown proxies) |
In-processing techniques incorporate fairness constraints into the model training process itself.
| Technique | Description |
|---|---|
| Constrained optimization | Adding fairness constraints (such as equal false positive rates across groups) to the loss function during training |
| Adversarial debiasing | Training two models simultaneously: a primary model that predicts the target outcome and an adversary that tries to predict the sensitive attribute from the primary model's outputs. The primary model is penalized when the adversary succeeds. |
| Regularization-based methods | Adding a penalty term to the loss function that measures the correlation between predictions and the sensitive attribute |
| Fair classification algorithms | Algorithms designed from the ground up to satisfy specific fairness constraints, such as the reduction approach by Agarwal et al. (2018) |
Post-processing techniques adjust a trained model's outputs to improve fairness without modifying the model itself.
| Technique | Description |
|---|---|
| Threshold adjustment | Setting different classification thresholds for different groups to equalize error rates |
| Equalized odds post-processing | Adjusting predictions using the method of Hardt et al. (2016) to satisfy equalized odds by finding optimal group-specific thresholds on the ROC curve |
| Calibration | Adjusting predicted probabilities so that they are well-calibrated within each group |
| Reject option classification | Giving favorable outcomes to disadvantaged group members and unfavorable outcomes to advantaged group members in cases where the model is uncertain (near the decision boundary) |
Each stage involves trade-offs. Pre-processing methods are model-agnostic but may discard useful information. In-processing methods can tightly integrate fairness into training but require modifications to standard algorithms. Post-processing methods are easy to apply but treat the model as a black box and may not address root causes of unfairness.
The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) system, developed by Northpointe (now Equivant), is a risk assessment tool used in the U.S. criminal justice system to predict the likelihood that a defendant will reoffend. In 2016, ProPublica published an influential analysis of COMPAS scores for over 10,000 defendants in Broward County, Florida. The investigation found that Black defendants were nearly twice as likely as white defendants to be incorrectly classified as high risk (false positives), while white defendants were more likely to be incorrectly classified as low risk (false negatives).
Northpointe responded by pointing out that the overall accuracy rate (approximately 60%) was the same for both racial groups and that the tool satisfied predictive parity. This exchange highlighted a fundamental insight: when base rates differ between groups, it is mathematically impossible to simultaneously achieve equal false positive rates, equal false negative rates, and equal predictive values. The COMPAS debate became a defining example of the impossibility theorem in algorithmic fairness.
In 2019, multiple applicants reported that the Apple Card, issued by Goldman Sachs, offered significantly higher credit limits to men than to women, even when the women had higher credit scores and shared finances with their male partners. The New York State Department of Financial Services launched an investigation. Goldman Sachs stated that the algorithm did not use gender as an input. The case illustrated how gender-neutral inputs can still produce gender-correlated outputs through proxy variables and historical patterns in credit data.
Several open-source toolkits help practitioners detect and mitigate bias related to sensitive attributes.
| Toolkit | Developer | Key features |
|---|---|---|
| AI Fairness 360 (AIF360) | IBM Research | Over 70 fairness metrics, 12 bias mitigation algorithms covering pre-processing, in-processing, and post-processing stages |
| Fairlearn | Microsoft (now community-driven) | Fairness assessment dashboards, constraint-based mitigation algorithms, integration with scikit-learn |
| What-If Tool | Visual interface for exploring model performance across groups, counterfactual analysis, no code required | |
| Aequitas | University of Chicago | Bias audit toolkit for classification models, generates group-level fairness reports |
| Themis-ML | - | Fairness-aware machine learning library with discrimination-aware regularization and relabeling |
These tools typically require users to specify which features are sensitive attributes, the groups to compare, and the fairness metrics to evaluate. They then produce reports showing how model performance and outcomes differ across groups.
Several unresolved problems make the handling of sensitive attributes an active area of research.
Defining group boundaries. Sensitive attributes like race and gender are socially constructed categories with fluid and context-dependent boundaries. Forcing individuals into discrete categories can erase within-group diversity and the experiences of multiracial, nonbinary, or otherwise non-conforming individuals.
Collecting sensitive attribute data. Measuring fairness requires knowing group membership, but collecting sensitive attribute data raises privacy concerns and may itself be restricted by law (for example, under the GDPR). This creates a paradox: you need the data to audit for bias, but collecting the data may violate data protection rules.
Unknown proxies. Even when known proxies are identified and removed, previously unrecognized proxies may remain. In high-dimensional datasets, the number of potential proxy combinations is vast, making exhaustive auditing impractical.
Dynamic and context-dependent fairness. What counts as a sensitive attribute can vary by jurisdiction, domain, and time. An attribute that is protected in employment decisions may not be protected in insurance underwriting. Fairness requirements may also evolve as social norms change.
Tension between fairness and accuracy. Enforcing fairness constraints often reduces overall model accuracy. The magnitude of this trade-off depends on the specific fairness definition, the degree of existing bias, and the structure of the data. Research by Corbett-Davies and Goel (2018) has argued that some fairness constraints can be imposed with minimal accuracy loss, while others require significant sacrifices.
Global variation in protected classes. Different countries protect different attributes and define discrimination differently. A model deployed internationally must navigate a patchwork of legal requirements that may conflict with one another.