Differential privacy

Differential privacy is a mathematical framework that provides strong, formally provable privacy guarantees for statistical analyses or machine-learning models trained on sensitive data. It was introduced in 2006 by Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith at Microsoft Research, in the Theory of Cryptography Conference paper "Calibrating Noise to Sensitivity in Private Data Analysis," and refined the same year in Dwork's ICALP paper "Differential Privacy." It has since become the dominant formal privacy notion across academia, industry, and government, with deployments at Apple, Google, Microsoft, Meta, LinkedIn, Uber, and the United States Census Bureau.

Formally, a randomised algorithm M satisfies (epsilon, delta)-differential privacy if for all neighbouring datasets D and D' that differ in a single record, and for all output sets S, Pr[M(D) is in S] is at most e^epsilon times Pr[M(D') is in S], plus delta. The parameter epsilon (the privacy budget) bounds how much the output distribution can shift when one person's data is added or removed; delta is a small slack term that allows the bound to fail with low probability. The setup is deliberately worst-case: no assumption about what the adversary already knows, will learn later, or holds as side data.

Why differential privacy matters

Before differential privacy, most data-release techniques fell into the category of anonymisation: stripping identifiers, generalising attributes, or aggregating records. Latanya Sweeney's 1997 work showed that 87% of the U.S. population could be uniquely identified by ZIP code, birth date, and sex, which she used to re-identify the medical records of Massachusetts governor William Weld. Her 2002 paper introduced k-anonymity as a formal model, but subsequent work showed that even k-anonymous releases were vulnerable to homogeneity and background-knowledge attacks. The 2006 release of AOL search logs and Arvind Narayanan and Vitaly Shmatikov's 2008 de-anonymisation of the Netflix Prize dataset confirmed that ad hoc anonymisation does not survive contact with auxiliary information.

Differential privacy reframed the problem. Instead of trying to reason about what an attacker might or might not know, it asked a different question: can we guarantee that the output of an algorithm is essentially the same whether or not any single person participates? If yes, no analyst, regardless of their auxiliary knowledge, can learn much about that person from the output.

Four properties make the framework useful in practice.

Mathematically rigorous guarantees. Once a mechanism is shown to satisfy (epsilon, delta)-DP, the bound holds against any adversary, even one with unlimited computational power and arbitrary side information.
Quantifiable privacy-utility tradeoff. The single parameter epsilon turns vague intuitions about "strong" or "weak" privacy into a number that engineers, lawyers, and policymakers can argue about.
Post-processing immunity. If the output of an (epsilon, delta)-DP mechanism is fed into any subsequent function, the result still satisfies the same bound. There is no way to undo the guarantee by clever analysis.
Predictable composition. Running multiple DP queries adds up in a controlled way; modern accountants such as Renyi DP give tighter bounds.

These properties are why a statistical agency can publish thousands of tables from a single sensitive data set with a documented privacy budget, and why a phone vendor can collect telemetry from a billion devices without seeing what any individual user typed.

Foundational concepts

Neighbouring datasets

The definition of "neighbouring" determines what the framework actually protects. Two main conventions appear in the literature:

Add/remove neighbours: D' is obtained from D by adding or removing one record. This is sometimes called "unbounded" DP.
Substitute neighbours: D' is obtained from D by replacing one record with another. This is sometimes called "bounded" DP.

The two conventions differ by a factor of two in epsilon, and most papers state which they use. Either way, the protection is at the level of a single record, which usually corresponds to a single person, though group-level extensions exist.

Privacy budget (epsilon)

Epsilon is the central knob. Smaller epsilon means stronger privacy and more noise; larger epsilon means weaker privacy and tighter answers. The Dwork and Roth 2014 monograph notes that there is no universal correct value, but typical industry deployments fall in the range 0.1 to 10. Apple's macOS Sierra implementation reportedly used per-event epsilons of 1, 2, or 4 depending on the use case; the U.S. Census Bureau's 2020 redistricting product used a total budget of 19.61. Smaller is better in principle, but small enough often makes the data useless.

Delta

Delta is the probability of "catastrophic" privacy failure: the chance that the e^epsilon multiplicative bound simply does not hold. The conventional wisdom from Dwork and Roth is that delta should be much smaller than 1/n, where n is the size of the dataset, because a delta of, say, 1/n would allow the mechanism to publish a single random record verbatim with non-negligible probability. Pure DP (epsilon-DP) sets delta to zero.

Sensitivity

Sensitivity is the maximum amount that a function f can change its output when one record changes. For a counting query ("how many users opened the app yesterday?"), the sensitivity is one. For a sum query over bounded values, the sensitivity is the bound. For more complex queries, sensitivity may be hard to compute, which is why elastic sensitivity, smooth sensitivity, and related notions exist.

Composition

If you run k independent (epsilon, delta)-DP mechanisms on the same data, the basic composition theorem says the combined mechanism satisfies (k * epsilon, k * delta)-DP. Dwork, Rothblum, and Vadhan's 2010 advanced composition theorem improves this to roughly (epsilon * sqrt(2k * log(1/delta)), k * delta + delta')-DP for adaptive choices, an important sublinear scaling. Renyi DP, concentrated DP, and Gaussian DP each provide accounting languages where composition is exactly additive in a transformed quantity, then converted back to (epsilon, delta).

Mechanisms

The field has developed a small toolkit of building blocks that turn non-private algorithms into DP ones.

Mechanism	Original paper	Output type	Noise distribution	Notes
Laplace mechanism	Dwork, McSherry, Nissim, Smith 2006	Real-valued	Laplace(scale = sensitivity / epsilon)	Standard for ε-DP queries on numeric outputs
Gaussian mechanism	Dwork, Kenthapadi, McSherry, Mironov, Naor 2006	Real-valued	Gaussian(0, sigma^2)	Satisfies (ε, δ)-DP for sigma at least sensitivity * sqrt(2 ln(1.25/δ)) / ε
Exponential mechanism	McSherry & Talwar 2007	Discrete selection	Sample item with probability proportional to exp(ε * utility / (2 * sensitivity))	Used when output is not naturally numeric
Sparse vector technique	Dwork, Naor, Reingold, Rothblum, Vadhan 2009; Lyu, Su, Li 2017	Stream of yes/no answers	Threshold check on noisy queries	Pays privacy only for queries above a threshold
Randomised response	Warner 1965	Binary answer	Bernoulli flip	Predates DP but is the canonical local DP mechanism
PATE	Papernot, Abadi, Erlingsson, Goodfellow, Talwar 2017	Classifier outputs	Noisy aggregation of teacher votes	Used for semi-supervised learning on sensitive data

The Laplace mechanism is the workhorse: to answer a numeric query f with sensitivity Δf under ε-DP, sample a Laplace variable L with scale Δf/ε and release f(D) + L. The Gaussian mechanism does the same with Gaussian noise and (ε, δ)-DP, which composes more nicely under the moments accountant. The exponential mechanism extends the idea to discrete choices, picking the item with the best noisily perturbed utility score, and underlies private auctions, private median selection, and private best-arm identification.

Trust models

The original Dwork et al. 2006 definition assumed a trusted curator who holds raw data and releases noisy outputs. This is the central model. Local differential privacy moves noise to the user's device: each user perturbs their value before sending it to a server. The server only ever sees noisy data, but it must aggregate many noisy reports to estimate population statistics. Local DP is used by Google's RAPPOR (Erlingsson, Pihur, Korolova, CCS 2014) for Chrome telemetry and by Apple's iOS and macOS deployments since 2016.

Shuffle DP sits between the two. Each user sends a locally randomised report through a trusted shuffler that strips identifiers and permutes messages. Amplification by shuffling lets shuffle-DP get most of the utility of central DP without trusting a single curator. Bittau et al.'s 2017 ESA paper formalised the model.

Tighter accounting frameworks

Variant	Authors	Year	Key idea
Concentrated DP (mCDP)	Dwork, Rothblum	2016	Privacy loss has small mean and is subgaussian
Zero-concentrated DP (zCDP)	Bun, Steinke	2016	Bounds Renyi divergence by ρ * α for all α > 1
Renyi DP	Mironov	2017	Bounds Renyi divergence at order α; cleaner composition
Truncated CDP	Bun, Dwork, Rothblum, Steinke	2018	Robust extension of zCDP
f-DP / Gaussian DP	Dong, Roth, Su	2022 (J. Royal Stat. Soc. B)	Privacy as a hypothesis-testing tradeoff function

These frameworks are not new privacy notions in a moral sense; they are accounting languages that compose more cleanly than (ε, δ)-DP and convert back to it for reporting. Modern DP-SGD implementations almost always track privacy loss in Renyi DP or Gaussian DP and convert at the end.

Group privacy

The single-record neighbourhood definition naturally extends to groups: if a mechanism is (ε, δ)-DP, then for any two datasets differing in k records, the bound becomes (k * ε, k * e^((k-1)*ε) * δ). Group privacy matters when records correspond to households, families, or repeated measurements from the same individual, since one person may contribute multiple rows.

DP-SGD: differentially private deep learning

Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang's 2016 CCS paper "Deep Learning with Differential Privacy" gave the field its standard recipe for training neural networks with formal privacy guarantees. The algorithm, usually called DP-SGD, adds two modifications to ordinary stochastic gradient descent.

Per-example gradient clipping. For each example in the minibatch, compute its individual gradient and clip its L2 norm to a chosen value C. This bounds the sensitivity of the gradient sum to a single example by C.
Gaussian noise on the aggregated gradient. Add Gaussian noise with standard deviation sigma * C to the sum of clipped gradients before averaging and updating.

The paper's other contribution is the moments accountant, a tighter privacy-loss tracker than basic or advanced composition. The moments accountant tracks the log moment-generating function of the privacy loss random variable, which gives substantially better epsilons for the typical Gaussian-with-subsampling setup of DP-SGD. Modern implementations have largely replaced the moments accountant with Renyi DP accounting (Mironov 2017) or the GDP-based accountant of Dong, Roth, Su 2022, which are equivalent in spirit but easier to extend.

The practical cost of DP-SGD is real but manageable. Per-example gradient computation is often 2 to 3 times more expensive than batched gradient computation, noise reduces final accuracy at meaningful epsilons, and small underrepresented subgroups suffer disproportionately. Three open libraries dominate: Opacus (Meta, PyTorch, described in the 2021 paper "Opacus: User-Friendly Differential Privacy Library in PyTorch"), TensorFlow Privacy (Google), and JAX-Privacy (Google DeepMind). DP-SGD is now standard for federated learning, sensitive-data fine-tuning of foundation models, and any setting where membership-inference attacks are a real concern.

Real-world deployments

The last decade has seen differential privacy move from theory to production.

Year	Organisation	Application	Trust model
2008	OnTheMap (US Census LEHD)	Commuter origin-destination flows	Central
2014	Google	RAPPOR for Chrome telemetry, default homepage stats	Local
2016	Apple	QuickType, emoji prediction, Spotlight Lookup, Safari crash data	Local
2017	Microsoft	Windows telemetry, later Office Excel pivot tables	Central
2017	Uber	FLEX SQL with elastic sensitivity for internal analytics	Central
2017	LinkedIn	Marketing analytics, audience-engagement insights	Central
2019	Google	Mobility reports during COVID-19	Central
2020	Meta (Facebook)	Movement Range Maps for COVID-19, ad-targeting tools	Central
2020	U.S. Census Bureau	2020 Decennial Census Disclosure Avoidance System (DAS)	Central
2021+	Google	Gboard next-word prediction with federated learning + DP	Local + central

The U.S. Census deployment is the most consequential to date. The 2020 DAS replaces the swapping-based protection used in earlier decades with a TopDown algorithm that injects calibrated noise at every level of the geographic hierarchy. The Census Bureau announced in June 2021 a total privacy-loss budget of epsilon = 19.61 for the redistricting data product, split into 17.14 for the persons file and 2.47 for the housing-unit file, with an extremely small delta. The decision generated unusual public debate, including a lawsuit from Alabama and academic critiques about disproportionate accuracy losses for small rural and non-white populations.

Apple's deployment, described in the company's 2017 white paper "Learning with Privacy at Scale," added local DP to iOS 10 and macOS Sierra in 2016 for QuickType, emoji recommendations, Spotlight deep-link suggestions, Lookup hints in Notes, and Safari crash counts. The 2017 "Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12" paper by Tang et al. reverse-engineered the per-event epsilons (which Apple did not initially disclose) and reported values of 1, 2, and 4 across categories, with weekly resets that made the cumulative budget much larger than per-event values suggested.

Google's RAPPOR, deployed in Chrome from 2014, is the original mass-market local DP system. It collects browser-default-changing events and other low-sensitivity telemetry by having each client run a randomised-response protocol locally, then aggregates noisy reports server-side.

Use cases in AI

Differential privacy shows up across the AI stack.

Training models on sensitive data. Hospitals fine-tune medical models on patient records using DP-SGD, banks train fraud-detection models on transaction histories, and large labs experiment with DP fine-tuning of foundation models.
Synthetic data generation. DP generative models (DP-CTGAN, DP-GAN, DP diffusion) produce synthetic datasets that can be released without re-identification risk.
Federated learning. DP is layered on top of federated averaging so the central server cannot learn about individual users' updates. DP-FedAvg (McMahan, Ramage, Talwar, Zhang 2018) is the canonical reference.
Telemetry aggregation. Apple, Google, and Microsoft use DP to estimate population-level statistics about software use without holding raw user data.
Census and survey data release. The 2020 U.S. Census is the largest example; statistical agencies in Canada, the UK, and the EU are running pilots.

Tools and libraries

Actively maintained DP libraries as of 2026:

Library	Maintainer	Language	Primary use
Opacus	Meta	Python (PyTorch)	DP-SGD, per-sample gradients
TensorFlow Privacy	Google	Python (TensorFlow)	DP-SGD, PATE, accountants
JAX-Privacy	Google DeepMind	Python (JAX)	DP-SGD with fast vectorisation
OpenDP	Harvard / Microsoft	Rust core, Python and R bindings	General DP query release
Google Differential Privacy Libraries	Google	C++, Go, Java	Aggregation primitives
PipelineDP	Google / OpenMined	Python	Apache Beam DP analytics
Diffprivlib	IBM	Python	DP analogues of scikit-learn
Tumult Analytics	Tumult Labs	Python	Commercial DP analytics SDK
SmartNoise	Microsoft / OpenDP	Python	DP for SQL workloads

Libraries differ in trust model (central vs local), supported queries (counts, sums, ML), accounting language (basic, RDP, GDP), and degree of formal verification. OpenDP and SmartNoise place particular emphasis on auditable privacy proofs.

Privacy accounting in practice

The accounting story has tightened steadily over the years. The 2010 advanced composition theorem of Dwork, Rothblum, and Vadhan replaced linear k * epsilon scaling with a sqrt(k) factor, which alone made many large-scale deployments feasible. Abadi et al.'s 2016 moments accountant gave another substantial improvement for the Gaussian-with-subsampling setup of DP-SGD. Mironov's 2017 Renyi DP, Bun and Steinke's 2016 zCDP, and Dong, Roth, and Su's 2022 f-DP each push the bound closer to optimal. Modern libraries track privacy loss in one of these tighter frameworks during training and convert to (epsilon, delta) only for reporting.

Critiques and limitations

Differential privacy is the strongest privacy notion in widespread use, but it is not free of problems.

Picking epsilon is genuinely hard. The 2020 Wood et al. "Differential Privacy: A Primer for a Non-Technical Audience" essay in the Vanderbilt Journal of Entertainment and Technology Law devotes significant space to the question and arrives at no universal answer. Numbers like epsilon = 1 sound stringent but allow a multiplicative shift of e in posterior odds, more permissive than many policymakers assume. Communication is harder still: explaining to a Census respondent that their record is protected because no analyst's posterior odds can shift by more than a factor of e^19.61 is a tough sell.

Auditing is partially addressed but not solved. Black-box verification that an implementation satisfies its claimed (epsilon, delta)-DP bound is hard. The Auditing DP-SGD line of work gives empirical lower bounds on actual leakage by running membership-inference attacks under known training data, but the gap between provable upper bounds and empirical lower bounds is often large.

Reconstruction attacks remain a live research area. The Census Bureau's own 2018 reconstruction experiment showed that the 2010 swapping-based DAS allowed reconstruction of underlying records to a significant fraction, which was the empirical motivation for switching to DP for 2020.

Utility for small subgroups is the most painful tradeoff. The same noise that protects a record also dominates the signal for groups smaller than roughly 1/epsilon. The 2024 PNAS paper "The 2020 US Census Differential Privacy Method Introduces Disproportionate Discrepancies for Rural and Non-White Populations" documents larger relative errors in tracts with small rural and non-white populations than under prior methods. The choice of how to spread the privacy budget across geographies is ultimately a policy decision.

Finally, the proof depends on the neighbourhood definition. A guarantee against add-or-remove neighbours says nothing directly about an attacker who already has many of the records and wants to learn about a specific one. Group privacy and user-level DP give stronger guarantees but at a corresponding cost in epsilon.

Connection with other privacy techniques

Technique	Type of guarantee	Relationship to DP
k-anonymity (Sweeney 2002)	Syntactic	Weaker; vulnerable to linkage and homogeneity attacks
l-diversity (Machanavajjhala et al. 2007)	Syntactic	Strengthens k-anonymity but still no formal guarantee
Secure multi-party computation	Computational, no leakage from computation	Orthogonal; can compute DP outputs without trusting a curator
Fully homomorphic encryption	Computational, no leakage from computation	Orthogonal; allows computing on encrypted data
Federated learning	Decentralised training	Pairs naturally with DP; FL alone does not give formal guarantees
Trusted execution environments (SGX, TDX)	Hardware-rooted confidentiality	Often used as a substitute curator for shuffle and central DP

The key conceptual difference is that DP makes a statistical statement about output distributions, while MPC and FHE make computational statements about what an adversary can extract from encrypted data. The two families compose: a federated learning pipeline can use MPC to aggregate model updates and DP to ensure the aggregated update itself does not leak about any single client.

Regulatory context

DP has gradually entered regulatory discourse, though it is rarely codified.

The European Data Protection Board and earlier Article 29 Working Party Opinion 05/2014 discuss DP as a candidate technique under GDPR anonymisation rules without endorsing a specific epsilon.
HIPAA's Privacy Rule allows expert-determination de-identification, under which DP-based releases can qualify, but DP is not in the Safe Harbor list.
The U.S. Census Bureau's 2020 endorsement of DP is the largest formal government adoption to date.
The California CCPA/CPRA, Virginia's VCDPA, and similar state laws do not mandate DP, though several reference "de-identified" data in ways DP can satisfy.

No major regulator has yet specified a particular epsilon as the threshold for legal anonymisation, which is probably the hardest open policy question in the field.

Differential privacy

Why differential privacy matters

Foundational concepts

Neighbouring datasets

Privacy budget (epsilon)

Delta

Sensitivity

Composition

Mechanisms

Variants and refinements

Trust models

Tighter accounting frameworks

Group privacy

DP-SGD: differentially private deep learning

Real-world deployments

Use cases in AI

Tools and libraries

Privacy accounting in practice

Critiques and limitations

Connection with other privacy techniques

Regulatory context

References

Improve this article

Related Articles

Federated Learning

General Data Protection Regulation (GDPR)

A/B Testing

Generalized Linear Model

L1 Loss

L2 Loss

Differential privacy

Why differential privacy matters

Foundational concepts

Neighbouring datasets

Privacy budget (epsilon)

Delta

Sensitivity

Composition

Mechanisms

Variants and refinements

Trust models

Tighter accounting frameworks

Group privacy

DP-SGD: differentially private deep learning

Real-world deployments

Use cases in AI

Tools and libraries

Privacy accounting in practice

Critiques and limitations

Connection with other privacy techniques

Regulatory context

References

Related Articles

Federated Learning

General Data Protection Regulation (GDPR)

A/B Testing

Generalized Linear Model

L1 Loss

L2 Loss