Sensitive Attribute
Last reviewed
Sources
19 citations
Review status
Source-backed
Revision
v3 ยท 4,601 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
19 citations
Review status
Source-backed
Revision
v3 ยท 4,601 words
Add missing citations, update stale details, or suggest a clearer explanation.
A sensitive attribute (also called a protected attribute or protected characteristic) is any feature in a dataset that corresponds to a legally or ethically protected personal trait, such as race, sex or gender, age, religion, disability status, or national origin, and with respect to which a model must not discriminate. In machine learning and artificial intelligence, sensitive attributes are the axis along which algorithmic fairness is defined: nearly every group-fairness metric is computed by partitioning a population into groups by the sensitive attribute and comparing outcomes across those groups. The concept bridges law, ethics, and technical practice. Anti-discrimination statutes (in the United States, Title VII of the Civil Rights Act, the Equal Credit Opportunity Act, and the Fair Housing Act) define which attributes are protected, while fairness researchers develop methods to detect and reduce the influence of those attributes on automated decisions. A central, counterintuitive finding is that simply deleting the sensitive attribute does not make a model fair, because other features in the data act as proxies that reconstruct it.[1][10]
Imagine a teacher is picking students for a school play. The teacher should choose based on how well each student acts, not based on whether the student is a boy or a girl, what color skin they have, or how old they are. A "sensitive attribute" is one of those personal things about a person (like their gender or race) that should not affect decisions about them. When computers make decisions, they sometimes accidentally pay attention to these things, which can make the decision unfair. So engineers have to be really careful to make sure the computer treats everyone fairly, no matter what group they belong to.
A sensitive attribute is a variable in a dataset that identifies membership in a group protected by law or ethical norms. When an automated system uses such a variable, whether directly or through correlated features, to make decisions about individuals, the resulting outcomes may be discriminatory and may expose the operator to legal liability.
Sensitive attributes differ from ordinary features in two ways. First, anti-discrimination law restricts how they may influence decisions in domains like employment, lending, housing, and criminal justice. Second, the groups they define often have historical experiences of systemic disadvantage, so errors that correlate with group membership can compound existing inequalities and amplify AI bias.
The term "sensitive attribute" is more common in the AI ethics and fairness literature, while "protected attribute" or "protected characteristic" tends to appear in legal contexts. Both refer to the same concept. In the foundational legal analysis of the topic, Solon Barocas and Andrew Selbst argue that addressing discrimination by data-driven systems "will require more than best efforts to stamp out prejudice and bias; it will require a wholesale reexamination of the meanings of 'discrimination' and 'fairness.'"[1]
The following table lists the most frequently cited sensitive attributes in both legal frameworks and machine learning fairness research.
| Attribute | Description | Typical legal basis (U.S.) | Notes |
|---|---|---|---|
| Race / Ethnicity | Racial or ethnic group membership | Civil Rights Act of 1964, Title VII | One of the most studied attributes in bias research |
| Sex / Gender | Biological sex, gender identity, sexual orientation | Title VII, Title IX | Includes pregnancy, gender identity, and intersex status |
| Age | Chronological age or age group | Age Discrimination in Employment Act (ADEA) | Protects individuals 40 and older in employment |
| Disability | Physical or mental impairment | Americans with Disabilities Act (ADA) | Includes both visible and invisible disabilities |
| Religion | Religious belief or practice | Title VII, First Amendment | Includes lack of religious belief |
| National origin | Country of birth or ancestry | Title VII | Often correlated with language and immigration status |
| Skin color | Complexion independent of racial classification | Title VII | Legally distinct from race |
| Marital / Family status | Marriage, partnership, or parental status | Fair Housing Act | Relevant in housing and lending decisions |
| Genetic information | DNA, family medical history | Genetic Information Nondiscrimination Act (GINA) | Added to federal protection in 2008 |
| Veteran status | Military service history | Vietnam Era Veterans' Readjustment Assistance Act (VEVRAA) | Relevant in federal contracting and employment |
| Socioeconomic status | Income level, education, class | Varies by jurisdiction | Not uniformly protected under federal law but frequently studied |
Multiple legal regimes around the world restrict how sensitive attributes may be used in decision-making. These laws predate AI systems but apply to algorithmic decisions just as they apply to human decisions.
U.S. anti-discrimination law is organized around specific protected classes rather than a single omnibus statute.
The Equal Employment Opportunity Commission (EEOC) has published guidance stating that employers using AI tools in hiring remain liable for discriminatory outcomes, even when a third-party vendor supplies the algorithm.[15] The four-fifths rule (or 80% rule) serves as a common screening threshold: under the Uniform Guidelines on Employee Selection Procedures, a selection rate for any race, sex, or ethnic group that is less than four-fifths (80%) of the rate for the highest-selected group is generally treated by federal enforcement agencies as evidence of disparate impact.[15]
The EU takes a different approach, combining data protection regulation with AI-specific rules.
Canada's Canadian Human Rights Act protects attributes including race, national or ethnic origin, sex, sexual orientation, age, marital status, family status, disability, and pardoned criminal convictions. The United Kingdom's Equality Act 2010 identifies nine protected characteristics in Section 4: age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, and sexual orientation.[17] Australia's anti-discrimination laws protect race, sex, disability, and age at the federal level, with additional protections at the state level.
Removing a sensitive attribute from a dataset does not guarantee that a model will produce fair outcomes. Other features in the dataset may be statistically correlated with the removed attribute and serve as proxy variables, allowing the model to reconstruct the protected information indirectly. This is the core mechanism behind the failure of so-called fairness through unawareness.[1][10]
Proxy variables emerge from the structure of social and economic life. Residential segregation means that zip codes correlate strongly with race. Occupation and industry correlate with gender. First names and surnames correlate with ethnicity. Educational institution names can signal socioeconomic background, gender (in the case of single-sex colleges), or religious affiliation.
Deep learning models are especially adept at discovering subtle proxy relationships because they can learn complex nonlinear interactions among features. A model may combine several individually weak proxies to reconstruct a sensitive attribute with high accuracy, even when no single feature is obviously discriminatory. This redundant-encoding problem is what makes fairness constraints and active auditing, rather than mere attribute deletion, the standard response.
Redlining in lending. The term "redlining" originally referred to the practice of drawing red lines on maps around predominantly Black neighborhoods and refusing mortgage loans to residents within those areas. Although explicit racial redlining was outlawed by the Fair Housing Act of 1968, modern algorithmic lending systems can reproduce similar patterns by weighting zip codes, neighborhood characteristics, and local property values, all of which correlate with race. The Consumer Financial Protection Bureau (CFPB) has taken enforcement actions against lenders whose algorithmic systems produced disparate impacts through geographic proxies.
Amazon's hiring tool. In 2018, Reuters reported that Amazon had developed an AI-powered resume screening tool trained on ten years of hiring data.[8] Because the technology industry workforce skews heavily male, the model learned to penalize resumes containing the word "women's" (as in "women's rugby team") and downgrade graduates of two all-women's colleges, while favoring verbs more common on men's resumes. Although gender was not an explicit input, textual proxies encoded gender information. Amazon abandoned the tool after concluding it could not guarantee the algorithm would be gender-neutral; the company said the tool was never used to evaluate candidates in production.[8]
Healthcare resource allocation. A widely cited 2019 study by Obermeyer et al., published in Science, found that a healthcare algorithm used by major U.S. hospitals to allocate care management resources was systematically biased against Black patients.[7] The algorithm used predicted healthcare cost as a proxy for health need. Because less money is historically spent on Black patients at a given level of illness (an average of roughly $1,800 less per year for a patient with the same health status), the algorithm assigned Black patients lower risk scores than equally sick white patients.[7] The authors calculated that correcting the bias would increase the share of Black patients flagged for additional help from 17.7% to 46.5%.[7]
Kidney function estimation. For decades, the estimated glomerular filtration rate (eGFR) formula used in clinical medicine included a race-based correction factor that reported higher kidney function values for Black patients. This made Black patients appear healthier and delayed their access to specialist referrals and transplant waitlists. In 2021, a joint task force of the National Kidney Foundation and the American Society of Nephrology recommended removing race from the eGFR calculation.[18] Studies estimated that the change would make roughly 300,000 additional Black Americans eligible for nephrology referral and about 31,000 more eligible for transplant evaluation and waitlist inclusion.[18]
The approach of simply deleting sensitive attributes from a dataset before model training is known as fairness through unawareness. Research has repeatedly demonstrated that this approach is insufficient for several reasons:[1][10]
Researchers have proposed numerous formal definitions of fairness, each specifying a different mathematical relationship between model predictions and sensitive attributes. These definitions often conflict with one another, and satisfying one may violate another. Crucially, every group-fairness definition below is stated as a constraint over the groups A = a, A = b that the sensitive attribute defines.
Group fairness criteria compare aggregate outcomes across groups defined by a sensitive attribute.
| Criterion | Also known as | Mathematical condition | Interpretation |
|---|---|---|---|
| Demographic parity | Statistical parity, independence | P(Y_hat = 1 | A = a) = P(Y_hat = 1 | A = b) | Positive prediction rates are equal across groups |
| Equalized odds | Separation | P(Y_hat = 1 | Y = y, A = a) = P(Y_hat = 1 | Y = y, A = b) for all y | True positive and false positive rates are equal across groups |
| Equal opportunity | - | P(Y_hat = 1 | Y = 1, A = a) = P(Y_hat = 1 | Y = 1, A = b) | True positive rates are equal across groups |
| Predictive parity | Sufficiency | P(Y = 1 | Y_hat = 1, A = a) = P(Y = 1 | Y_hat = 1, A = b) | Positive predictive values are equal across groups |
| Calibration | - | P(Y = 1 | S = s, A = a) = P(Y = 1 | S = s, A = b) for all s | At each score level, actual positive rates are equal across groups |
A foundational result in the fairness literature, the impossibility theorem proved by Kleinberg, Mullainathan, and Raghavan (2016) and independently by Chouldechova (2017), establishes that calibration, balanced false positive rates, and balanced false negative rates cannot all be satisfied simultaneously unless the base rates of the positive outcome are identical across groups or the classifier is perfect.[5] In practice, base rates almost always differ, so practitioners must choose which fairness criterion best fits their application.
Individual fairness requires that similar individuals receive similar predictions, regardless of their group membership. The idea was formalized by Dwork et al. (2012), whose central principle is that "any two individuals who are similar with respect to a particular task should be treated similarly."[2] Technically, the paper proposes that a fair classifier should be a Lipschitz mapping: individuals who are close in a task-relevant similarity metric should receive predictions that are close as well. The challenge lies in defining the similarity metric, which is itself a value-laden choice.
Counterfactual fairness, introduced by Kusner et al. (2017), uses causal reasoning to define fairness at the individual level.[3] A prediction is counterfactually fair if it would remain the same in a hypothetical world where the individual's sensitive attribute had been different, but everything else that is not causally downstream of the sensitive attribute had remained the same. Implementing counterfactual fairness requires specifying a causal model (typically a directed acyclic graph) that captures how the sensitive attribute influences other variables.
Real-world individuals belong to multiple protected groups simultaneously. A person may be, for example, a Black woman, an elderly person with a disability, or a young immigrant. The concept of intersectionality, introduced by legal scholar Kimberle Crenshaw in 1989, highlights that discrimination experienced at the intersection of multiple identities can differ qualitatively from discrimination along any single axis.[9]
In machine learning, ensuring fairness for each sensitive attribute independently does not guarantee fairness for intersectional subgroups. This problem is known as fairness gerrymandering: a model can appear fair with respect to race and fair with respect to gender while still discriminating against a specific race-gender subgroup (for example, Black women).
Technical approaches to intersectional fairness include:
A practical tension exists between the goals of intersectionality and statistical reliability. As subgroups become smaller, estimates of fairness metrics become noisier, making it harder to detect and correct bias for the most marginalized groups.
Methods for reducing the influence of sensitive attributes on model outcomes are typically organized by when in the machine learning pipeline they are applied.
Pre-processing techniques modify the training data before it is fed to a model.
| Technique | Description |
|---|---|
| Resampling | Oversampling underrepresented groups or undersampling overrepresented groups to balance the dataset |
| Reweighting | Assigning higher weights to training examples from disadvantaged groups so that the model pays more attention to them |
| Label correction | Identifying and correcting labels that reflect historical bias rather than ground truth |
| Fair representation learning | Transforming features into a latent space that retains predictive information while removing information about the sensitive attribute |
| Suppression | Removing the sensitive attribute and its known proxies from the dataset (limited effectiveness due to unknown proxies) |
In-processing techniques incorporate fairness constraints into the model training process itself.
| Technique | Description |
|---|---|
| Constrained optimization | Adding fairness constraints (such as equal false positive rates across groups) to the loss function during training |
| Adversarial debiasing | Training two models simultaneously: a primary model that predicts the target outcome and an adversary that tries to predict the sensitive attribute from the primary model's outputs. The primary model is penalized when the adversary succeeds. |
| Regularization-based methods | Adding a penalty term to the loss function that measures the correlation between predictions and the sensitive attribute |
| Fair classification algorithms | Algorithms designed from the ground up to satisfy specific fairness constraints, such as the reduction approach by Agarwal et al. (2018)[11] |
Post-processing techniques adjust a trained model's outputs to improve fairness without modifying the model itself.
| Technique | Description |
|---|---|
| Threshold adjustment | Setting different classification thresholds for different groups to equalize error rates |
| Equalized odds post-processing | Adjusting predictions using the method of Hardt et al. (2016) to satisfy equalized odds by finding optimal group-specific thresholds on the ROC curve[4] |
| Calibration | Adjusting predicted probabilities so that they are well-calibrated within each group |
| Reject option classification | Giving favorable outcomes to disadvantaged group members and unfavorable outcomes to advantaged group members in cases where the model is uncertain (near the decision boundary) |
Each stage involves trade-offs. Pre-processing methods are model-agnostic but may discard useful information. In-processing methods can tightly integrate fairness into training but require modifications to standard algorithms. Post-processing methods are easy to apply but treat the model as a black box and may not address root causes of unfairness.
The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) system, developed by Northpointe (now Equivant), is a risk assessment tool used in the U.S. criminal justice system to predict the likelihood that a defendant will reoffend. In 2016, ProPublica published an influential analysis of COMPAS scores for more than 7,000 defendants arrested in Broward County, Florida, in 2013 and 2014.[6] The investigation found that Black defendants were nearly twice as likely as white defendants to be incorrectly classified as high risk (false positives), while white defendants were more likely to be incorrectly classified as low risk (false negatives).[6]
Northpointe responded by pointing out that the overall accuracy rate (approximately 60%) was the same for both racial groups and that the tool satisfied predictive parity. This exchange highlighted a fundamental insight: when base rates differ between groups, it is mathematically impossible to simultaneously achieve equal false positive rates, equal false negative rates, and equal predictive values.[5] The COMPAS debate became a defining example of the impossibility theorem in algorithmic fairness.
In November 2019, multiple applicants reported that the Apple Card, issued by Goldman Sachs, offered significantly higher credit limits to men than to women, even in cases where the women had higher credit scores and shared finances with their male partners. The New York State Department of Financial Services (NYDFS) opened an investigation. In its March 2021 report, after analyzing underwriting data for approximately 400,000 New York applicants, the Department found no violation of fair lending law, concluding that applications from women and men with similar credit characteristics generally received similar outcomes, though it criticized a lack of transparency in credit decisions.[19] The case illustrated both how gender-neutral inputs can raise discrimination concerns through proxy variables and historical patterns in credit data, and how difficult it is to prove disparate treatment without access to the sensitive attribute.
Several open-source toolkits help practitioners detect and mitigate bias related to sensitive attributes.
| Toolkit | Developer | Key features |
|---|---|---|
| AI Fairness 360 (AIF360) | IBM Research | Over 70 fairness metrics and bias mitigation algorithms covering pre-processing, in-processing, and post-processing stages[14] |
| Fairlearn | Microsoft (now community-driven) | Fairness assessment dashboards, constraint-based mitigation algorithms, integration with scikit-learn |
| What-If Tool | Visual interface for exploring model performance across groups, counterfactual analysis, no code required | |
| Aequitas | University of Chicago | Bias audit toolkit for classification models, generates group-level fairness reports |
| Themis-ML | - | Fairness-aware machine learning library with discrimination-aware regularization and relabeling |
These tools typically require users to specify which features are sensitive attributes, the groups to compare, and the fairness metrics to evaluate. They then produce reports showing how model performance and outcomes differ across groups. IBM reports that AI Fairness 360 supports more than 70 metrics for measuring bias across datasets and model predictions.[14]
Several unresolved problems make the handling of sensitive attributes an active area of research.
Defining group boundaries. Sensitive attributes like race and gender are socially constructed categories with fluid and context-dependent boundaries. Forcing individuals into discrete categories can erase within-group diversity and the experiences of multiracial, nonbinary, or otherwise non-conforming individuals.
Collecting sensitive attribute data. Measuring fairness requires knowing group membership, but collecting sensitive attribute data raises privacy concerns and may itself be restricted by law (for example, under the GDPR's Article 9 prohibition on processing special categories of data).[16] This creates a paradox: you need the data to audit for bias, but collecting the data may violate data protection rules. This tension, the need to measure the very attribute one must not discriminate on, is one of the defining problems of fairness in practice.
Unknown proxies. Even when known proxies are identified and removed, previously unrecognized proxies may remain. In high-dimensional datasets, the number of potential proxy combinations is vast, making exhaustive auditing impractical.
Dynamic and context-dependent fairness. What counts as a sensitive attribute can vary by jurisdiction, domain, and time. An attribute that is protected in employment decisions may not be protected in insurance underwriting. Fairness requirements may also evolve as social norms change.
Tension between fairness and accuracy. Enforcing fairness constraints often reduces overall model accuracy. The magnitude of this trade-off depends on the specific fairness definition, the degree of existing bias, and the structure of the data. Research by Corbett-Davies and Goel (2018) has argued that some fairness constraints can be imposed with minimal accuracy loss, while others require significant sacrifices.[10]
Global variation in protected classes. Different countries protect different attributes and define discrimination differently. A model deployed internationally must navigate a patchwork of legal requirements that may conflict with one another.