Proxy variables for sensitive attributes are features in a machine learning dataset that are correlated with a protected characteristic (such as race, gender, age, religion, or disability) and therefore leak information about that characteristic to a model, even when the protected attribute itself is not used as input. Because supervised learners pick up any signal that improves predictive accuracy, a model trained on data that contains proxies can reproduce or amplify the same patterns of discrimination that the omitted attribute would have produced if it had been included directly. The presence of proxies is the central reason that simply deleting a sensitive column from a training set, an approach called fairness through unawareness, does not by itself prevent biased outcomes.
The study of proxy variables sits at the intersection of statistics, civil rights law, and algorithmic fairness. It became a public concern after a series of investigations into automated systems used in hiring, lending, criminal justice, and healthcare showed that ostensibly neutral models were producing systematically different outcomes for legally protected groups. Researchers and regulators have responded with formal definitions of fairness, statistical tests for proxy detection, mitigation algorithms at every stage of the modeling pipeline, and documentation standards for datasets and models.
A proxy is any observed variable X that carries information about a sensitive attribute S, in the sense that the conditional distribution P(S | X) differs meaningfully from the marginal distribution P(S). Equivalent ways of saying the same thing include: X has nonzero mutual information with S, X is predictive of S above the base rate, or X is statistically dependent on S. Proxies vary in strength. A near-perfect proxy lets a model reconstruct S almost exactly from X. A weak proxy carries only a small amount of information but, combined with other weak proxies, can still allow accurate reconstruction. Modern high-dimensional data, including text, images, click logs, and graph features, almost always contains weak proxies for common sensitive attributes.
The distinction between a sensitive attribute and a proxy is partly legal and partly statistical. The sensitive attribute is the characteristic that anti-discrimination law protects. The proxy is a feature that the model treats as ordinary input but that effectively encodes the protected characteristic. A model can engage in disparate treatment by acting on a proxy in the same way it would act on the protected attribute, and it can produce disparate impact by relying on a proxy whose distribution differs across protected groups, regardless of intent.
Which attributes count as sensitive depends on jurisdiction and context. The relevant categories under United States federal law include the following.
| Statute | Protected characteristics | Domain |
|---|---|---|
| Civil Rights Act of 1964, Title VII | Race, color, religion, sex, national origin | Employment |
| Age Discrimination in Employment Act (1967) | Age (40 and older) | Employment |
| Americans with Disabilities Act (1990) | Disability | Employment, public services |
| Equal Credit Opportunity Act (1974) | Race, color, religion, national origin, sex, marital status, age, receipt of public assistance | Credit |
| Fair Housing Act (1968, amended 1988) | Race, color, religion, sex, national origin, familial status, disability | Housing |
| Genetic Information Nondiscrimination Act (2008) | Genetic information | Employment, health insurance |
The European Union's General Data Protection Regulation (GDPR) defines special categories of personal data in Article 9, including racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data used for unique identification, health data, and data concerning sex life or sexual orientation. Processing these categories is generally prohibited unless one of the exceptions in Article 9(2) applies. The EU AI Act, which entered into force in 2024, classifies AI systems used in employment, credit scoring, education admissions, law enforcement, and migration as high-risk and requires providers to examine training, validation, and testing data sets for possible bias, and to conduct a fundamental rights impact assessment before deployment.
Many other regimes overlap with these. Some U.S. states protect additional categories such as sexual orientation, gender identity, source of income, or familial status. Brazil's LGPD, Canada's PIPEDA, and the United Kingdom's Equality Act 2010 each define their own lists. The practical effect is that what counts as a proxy depends on what the relevant law treats as sensitive in the specific use case.
A short catalog of proxies that recur in the empirical literature.
| Proxy | Sensitive attribute | Mechanism |
|---|---|---|
| ZIP code or postal code | Race, ethnicity, income | Historical residential segregation, redlining |
| Given name and surname | Race, ethnicity, gender, national origin | Naming conventions correlate with demographics |
| School attended | Race, class | Segregated school districts, selective admissions |
| Resume keywords (clubs, sports, vocabulary) | Gender, race | Differential socialization and self-presentation |
| Browser, device, screen resolution | Income | Hardware purchase ability, OS market share by demographic |
| Time and location of activity | Occupation, class, religion | Shift work, commuting patterns, observance of religious holidays |
| Friend graph or contact list | Race, religion, sexual orientation | Homophily in social networks |
| Photo metadata, EXIF tags | Income, geography | Camera model, location stamps |
| Healthcare costs | Race, access to care | Differential access produces lower spending for equally sick patients |
| Word embeddings of free text | Gender, race | Representation geometry encodes demographic associations |
| Pretrial detention status | Race, income | Bail systems with disparate effects |
ZIP code as a proxy for race is the canonical example in the U.S. context. Beginning in the 1930s, the Home Owners' Loan Corporation produced "residential security" maps that color-coded neighborhoods according to perceived investment risk; predominantly Black neighborhoods were outlined in red and effectively cut off from federally backed mortgage credit. Richard Rothstein's 2017 book The Color of Law documents how federal, state, and local governments enforced residential segregation throughout the twentieth century. The result is that ZIP codes in many U.S. metropolitan areas remain strongly correlated with race decades after explicit racial covenants were outlawed by the Fair Housing Act, and any model that uses ZIP code as a feature is, in effect, partly using race.
Bertrand and Mullainathan's 2004 field experiment Are Emily and Greg More Employable than Lakisha and Jamal? demonstrated the same dynamic for given names. The authors sent roughly 5,000 fictitious resumes to 1,300 job ads in Boston and Chicago, randomly assigning either stereotypically white or stereotypically Black names. Resumes with white names received about 50 percent more callbacks for interviews, and the return to higher resume quality was larger for white-named applicants. Names function as a strong proxy for race in this setting, and the discrimination was consistent across industry, occupation, and employer size.
Reuters reported in October 2018 that Amazon had built an internal resume-screening model trained on a decade of hiring data, scrapped the project after engineers found that it systematically downgraded resumes that contained the word "women's" (as in "women's chess club captain" or "women's college"), and penalized graduates of two unnamed all-women's colleges. The model had no explicit gender feature, but the historical training data reflected a male-dominated technical workforce, and the algorithm picked up gendered vocabulary as a proxy. Attempts to debias the lists of penalized terms could not guarantee that the system would not discover other proxies, and Amazon ultimately disbanded the team.
In May 2016, ProPublica reporters Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner published Machine Bias, an investigation of COMPAS, a recidivism prediction tool sold by Northpointe (now Equivant) and used in pretrial and sentencing decisions in several U.S. jurisdictions. ProPublica's analysis of more than 7,000 risk scores from Broward County, Florida, found that Black defendants who did not go on to re-offend within two years were almost twice as likely as white defendants to be labeled high risk, while white defendants who did re-offend were more likely than Black defendants to be labeled low risk. Northpointe responded that the tool was calibrated equally across race: at any given score, the probability of re-offense was similar for Black and white defendants. As Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2017) later proved formally, both sides were correct given their definitions, and the two definitions cannot be jointly satisfied when base rates differ across groups. The case became the standard example of how proxies and base-rate differences make statistical fairness genuinely hard rather than merely a matter of careful engineering.
In November 2019, software developer David Heinemeier Hansson tweeted that the Apple Card, issued by Goldman Sachs, had given him a credit limit twenty times higher than his wife's, despite shared assets and her higher credit score. The complaint went viral, drew similar accounts from other couples (including Steve Wozniak), and prompted an investigation by the New York Department of Financial Services. The DFS report issued in March 2021 did not find a violation of fair lending laws, but the superintendent noted that the case showed how nontransparent algorithmic credit models can erode public trust and that consumer credit laws need updating. The episode is often cited as an example of how a model can produce a gendered outcome through proxies even when sex is excluded from the inputs, and how difficult it is for regulators to investigate proprietary algorithms.
Obermeyer, Powers, Vogeli, and Mullainathan published Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations in Science in October 2019. They studied a commercial population health tool used by U.S. health systems to identify patients for extra care management. The model used predicted future healthcare costs as a stand-in for medical need. Because Black patients with the same level of illness historically generated lower spending (due to access barriers, mistrust, and other factors), the algorithm assigned Black patients lower risk scores at every level of measured illness. At a given algorithmic risk score, Black patients had 26.3 percent more chronic conditions than white patients. The authors estimated that correcting the bias would more than double the share of Black patients automatically identified for extra care, from 17.7 to 46.5 percent. Cost was a proxy for need, and the proxy was systematically worse for one group. Working with the manufacturer, the researchers showed that switching to direct measures of health reduced the disparity by about 84 percent.
Using an automated experimentation tool called AdFisher, Carnegie Mellon researchers Amit Datta, Michael Carl Tschantz, and Anupam Datta created 1,000 simulated browsing agents, half labeled male and half labeled female in Google's ad settings. After visiting a fixed set of employment websites, the male-labeled agents were shown ads for high-paying executive coaching services much more often than the female-labeled agents. The study did not establish whether the disparity came from advertiser targeting choices, Google's optimization, or upstream training data, but it showed that an opaque ad-targeting system was producing gendered outcomes that would be illegal in many employment contexts.
U.S. anti-discrimination doctrine recognizes two routes to a finding of discrimination. Disparate treatment is intentional differential treatment of individuals because of a protected characteristic. Disparate impact, established in Griggs v. Duke Power Co. (1971), refers to a facially neutral practice that disproportionately harms a protected group and is not justified by business necessity. Proxy variables sit awkwardly between the two doctrines. A model that uses a proxy without intent might produce disparate impact without disparate treatment. A model whose designers chose a proxy specifically to track race or sex can constitute disparate treatment by proxy, a form of intentional discrimination disguised as a neutral feature. Courts and agencies have addressed the latter, for example in housing and lending guidance from the Department of Housing and Urban Development and the Consumer Financial Protection Bureau.
The naive approach of removing the protected attribute from the training data, sometimes called fairness through unawareness, assumes that a model that never sees S cannot discriminate on S. Dwork and colleagues argued in Fairness Through Awareness (2012) that this is false in any realistic setting because proxies make S effectively recoverable from the remaining features. Hardt's later work and the broader literature reached the same conclusion: meaningful fairness in practice requires using S during evaluation, and often during training, rather than hiding it.
Researchers have proposed many group-level fairness criteria. The three most discussed are summarized below.
| Criterion | Definition | Also known as |
|---|---|---|
| Demographic parity | P(Y_hat = 1 | S = a) is equal across groups a |
| Equalized odds | True positive rate and false positive rate are equal across groups | Hardt, Price, and Srebro (2016) |
| Predictive parity | P(Y = 1 | Y_hat = 1, S = a) is equal across groups |
Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2017) independently proved that calibration and equalized odds (or, equivalently in the binary case, balance in error rates) cannot both hold across groups when the base rates of the outcome differ. This is a genuine impossibility result, not a matter of better engineering. It means that any system that meets one fairness criterion in a setting with unequal base rates will violate another, and the choice between them is normative.
Kusner, Loftus, Russell, and Silva introduced Counterfactual Fairness in 2017, defining a model as fair toward an individual if its prediction would have been the same in the counterfactual world where the individual's protected attribute had been different. Kilbertus and colleagues' Avoiding Discrimination Through Causal Reasoning (2017) proposed a related causal-graph-based framework distinguishing legitimate from illegitimate paths between S and the prediction. Both approaches require explicit assumptions about the data-generating process, which are hard to verify, but they offer a more principled treatment of proxies than purely observational criteria.
Because proxies can be subtle, practitioners use a small toolkit of statistical and adversarial methods to flag them.
Mutual information and conditional entropy quantify how much information a feature carries about S. Features with high mutual information are likely strong proxies. Pairwise correlations, regression coefficients, and chi-squared tests serve the same purpose for simpler cases.
Predictive auditing trains a separate model to predict S from the remaining features. If S can be predicted with high accuracy from the input set, then strong proxies are present. This procedure scales to large feature sets and works for unstructured inputs such as text and images.
Adversarial probing extends this idea inside the model. An auxiliary network attempts to recover S from intermediate representations. If it succeeds, proxies are encoded in the representation, and the model can in principle act on them.
Causal auditing inspects the assumed graph of relationships between features, S, and the outcome. Paths from S to the prediction that pass through legitimate explanatory variables (for example, S to education to job performance, where education is a legitimate qualification) may be acceptable, while direct paths or paths through non-explanatory proxies are not.
No single test is decisive. Strong proxies can hide in interactions, in nonlinear combinations of weak features, or in latent representations learned by deep models. Detection is an ongoing audit rather than a one-time check.
Fairness mitigation methods are usually grouped by where they intervene in the modeling pipeline.
| Stage | Method | Representative reference |
|---|---|---|
| Pre-processing | Reweighting and resampling | Kamiran and Calders (2012) |
| Pre-processing | Learning fair representations | Zemel, Wu, Swersky, Pitassi, Dwork (2013) |
| Pre-processing | Disparate impact remover | Feldman, Friedler, Moeller, Scheidegger, Venkatasubramanian (2015) |
| In-processing | Prejudice remover regularizer | Kamishima, Akaho, Asoh, Sakuma (2012) |
| In-processing | Fairness constraints in convex optimization | Zafar, Valera, Rodriguez, Gummadi (2017) |
| In-processing | Adversarial debiasing | Zhang, Lemoine, Mitchell (2018) |
| Post-processing | Equalized odds threshold adjustment | Hardt, Price, Srebro (2016) |
| Post-processing | Calibrated equalized odds | Pleiss, Raghavan, Wu, Kleinberg, Weinberger (2017) |
| Post-processing | Reject option classification | Kamiran, Karim, Zhang (2012) |
Pre-processing methods change the training data so that a downstream model trained on it produces fairer predictions, often by reweighting samples or learning a representation in which S cannot be predicted. In-processing methods modify the loss function or the optimization problem, for example by adding a regularizer that penalizes correlation between predictions and S, or by training the model jointly with an adversary that tries to recover S from its outputs. Post-processing methods take a fixed model and adjust its predictions or thresholds per group to satisfy a chosen fairness criterion.
Each approach has tradeoffs. Pre-processing is portable across downstream models but may degrade utility because the downstream model never sees the raw data. In-processing can deliver better fairness-utility tradeoffs but requires retraining and access to S during training. Post-processing is the simplest to deploy but requires using S at decision time, which may be legally restricted in domains such as U.S. consumer credit, where the Equal Credit Opportunity Act forbids using sex or race in the lending decision itself.
Several open source libraries package these algorithms with diagnostics and dashboards.
AIF360 (AI Fairness 360), released by IBM Research in 2018 with an accompanying paper by Bellamy and colleagues, includes more than seventy fairness metrics and a dozen mitigation algorithms covering pre-, in-, and post-processing.
Fairlearn, originally developed at Microsoft and described in a 2020 white paper led by Sarah Bird, focuses on group fairness assessment and mitigation in classification and regression, with an interactive dashboard and integrations for scikit-learn pipelines. The project moved to an independent open source governance model in 2021.
Google's What-If Tool and Fairness Indicators support interactive slicing of model performance by demographic subgroup and integrate with TensorFlow Model Analysis.
Aequitas, released in 2018 by the University of Chicago's Center for Data Science and Public Policy under Pedro Saleiro and colleagues, targets policymakers and auditors and emphasizes group bias and disparity reports.
Proxy mitigation faces several genuine limits.
First, no algorithm can remove sensitive information from data in which proxies are very predictive. If a feature set lets a separate model predict S with 98 percent accuracy, then any downstream classifier with sufficient capacity can also recover S, and removing or reweighting features will only force the model to rely on subtler combinations of the same signal.
Second, fairness criteria conflict. Calibration, equalized odds, and demographic parity cannot in general all be satisfied simultaneously when group base rates differ. Choosing a criterion is a policy decision, not a technical one, and different stakeholders will reasonably prefer different choices.
Third, group fairness is not the same as individual fairness. A model can satisfy demographic parity at the group level while still producing arbitrary, idiosyncratic decisions for individuals.
Fourth, statistical fairness is not the same as causal fairness. A model that satisfies equalized odds may still discriminate through pathways that a causal analysis would judge illegitimate, and conversely a counterfactually fair model may fail group statistics.
Fifth, the legal and the statistical can pull in opposite directions. The Equal Credit Opportunity Act and similar laws restrict the use of sensitive attributes in decisions, while many of the most effective fairness algorithms require those attributes either at training time, at decision time, or both. Lawyers, regulators, and engineers continue to negotiate this tension, and the answers vary by jurisdiction and product.
Sixth, fairness can trade off with accuracy, though the size of the tradeoff depends on the dataset and the criterion. In some cases, a model can be made fairer without losing meaningful accuracy. In other cases, particularly when the training labels themselves reflect historical bias, a meaningful gain in group fairness requires accepting lower aggregate predictive accuracy on those biased labels.
Large language models complicate the proxy story in two ways. First, their training data contains rich correlational structure between linguistic features and demographics, so the models pick up associations between, for example, names, dialects, occupations, and stereotypes. Bender, Gebru, McMillan-Major, and Shmitchell argued in On the Dangers of Stochastic Parrots (2021) that the size and opacity of modern LLMs makes it hard to audit which associations they have absorbed, and that the costs of fixing biased outputs after deployment are high. Second, LLMs are increasingly used as components in pipelines for screening resumes, generating clinical summaries, or making recommendations, where their proxy-laden outputs feed into downstream decisions that are subject to anti-discrimination law.
Generative models also raise new privacy attacks. Synthetic data generated to protect individual privacy can still leak attributes about the source population, including sensitive ones, through proxies in the generated samples. Membership inference and attribute inference attacks exploit precisely the kind of proxy structure that fair-ML researchers try to suppress.
The regulatory landscape has tightened. The EU AI Act requires high-risk AI providers to examine training data for bias and to test systems for discriminatory outcomes. New York City's Local Law 144, in effect since 2023, requires employers using automated employment decision tools to commission an independent bias audit and disclose the results. The Colorado AI Act of 2024 imposes similar obligations on developers and deployers of high-risk systems. In the United States, federal agencies including the EEOC, CFPB, FTC, and DOJ issued a joint statement in April 2023 affirming that existing anti-discrimination laws apply to AI-driven systems.
A reasonable proxy-aware development checklist looks something like the following.
Document the dataset using a Datasheet for Datasets (Gebru et al., 2018). Datasheets describe the dataset's motivation, composition, collection process, recommended uses, and known limitations, and they make proxy structure easier to spot during downstream review.
Document the model using Model Cards (Mitchell et al., 2019). A model card reports performance disaggregated by demographic group, intended uses, and known failure modes.
Audit for proxies before training using mutual information, predictive auditing, or causal analysis. Disclose strong proxies in the model card.
Choose a fairness criterion in consultation with domain experts and legal counsel, recognizing that the choice is normative.
Evaluate fairness with the same rigor as accuracy, including subgroup metrics, intersectional slices, and confidence intervals.
Monitor in production. Distribution shift can introduce new proxies that did not exist at training time, and feedback loops (for example, in predictive policing) can amplify bias over time.
Consider bias bounty programs, in which external researchers are paid to find unexpected fairness failures, modeled on security bug bounties. Twitter ran an early image-cropping bias bounty in 2021, and similar programs have followed at OpenAI and elsewhere.