Differential privacy
Last reviewed
Apr 30, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,038 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 30, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,038 words
Add missing citations, update stale details, or suggest a clearer explanation.
Differential privacy is a mathematical framework that provides strong, formally provable privacy guarantees for statistical analyses or machine-learning models trained on sensitive data. It was introduced in 2006 by Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith at Microsoft Research, in the Theory of Cryptography Conference paper "Calibrating Noise to Sensitivity in Private Data Analysis," and refined the same year in Dwork's ICALP paper "Differential Privacy." It has since become the dominant formal privacy notion across academia, industry, and government, with deployments at Apple, Google, Microsoft, Meta, LinkedIn, Uber, and the United States Census Bureau.
Formally, a randomised algorithm M satisfies (epsilon, delta)-differential privacy if for all neighbouring datasets D and D' that differ in a single record, and for all output sets S, Pr[M(D) is in S] is at most e^epsilon times Pr[M(D') is in S], plus delta. The parameter epsilon (the privacy budget) bounds how much the output distribution can shift when one person's data is added or removed; delta is a small slack term that allows the bound to fail with low probability. The setup is deliberately worst-case: no assumption about what the adversary already knows, will learn later, or holds as side data.
Before differential privacy, most data-release techniques fell into the category of anonymisation: stripping identifiers, generalising attributes, or aggregating records. Latanya Sweeney's 1997 work showed that 87% of the U.S. population could be uniquely identified by ZIP code, birth date, and sex, which she used to re-identify the medical records of Massachusetts governor William Weld. Her 2002 paper introduced k-anonymity as a formal model, but subsequent work showed that even k-anonymous releases were vulnerable to homogeneity and background-knowledge attacks. The 2006 release of AOL search logs and Arvind Narayanan and Vitaly Shmatikov's 2008 de-anonymisation of the Netflix Prize dataset confirmed that ad hoc anonymisation does not survive contact with auxiliary information.
Differential privacy reframed the problem. Instead of trying to reason about what an attacker might or might not know, it asked a different question: can we guarantee that the output of an algorithm is essentially the same whether or not any single person participates? If yes, no analyst, regardless of their auxiliary knowledge, can learn much about that person from the output.
Four properties make the framework useful in practice.
These properties are why a statistical agency can publish thousands of tables from a single sensitive data set with a documented privacy budget, and why a phone vendor can collect telemetry from a billion devices without seeing what any individual user typed.
The definition of "neighbouring" determines what the framework actually protects. Two main conventions appear in the literature:
The two conventions differ by a factor of two in epsilon, and most papers state which they use. Either way, the protection is at the level of a single record, which usually corresponds to a single person, though group-level extensions exist.
Epsilon is the central knob. Smaller epsilon means stronger privacy and more noise; larger epsilon means weaker privacy and tighter answers. The Dwork and Roth 2014 monograph notes that there is no universal correct value, but typical industry deployments fall in the range 0.1 to 10. Apple's macOS Sierra implementation reportedly used per-event epsilons of 1, 2, or 4 depending on the use case; the U.S. Census Bureau's 2020 redistricting product used a total budget of 19.61. Smaller is better in principle, but small enough often makes the data useless.
Delta is the probability of "catastrophic" privacy failure: the chance that the e^epsilon multiplicative bound simply does not hold. The conventional wisdom from Dwork and Roth is that delta should be much smaller than 1/n, where n is the size of the dataset, because a delta of, say, 1/n would allow the mechanism to publish a single random record verbatim with non-negligible probability. Pure DP (epsilon-DP) sets delta to zero.
Sensitivity is the maximum amount that a function f can change its output when one record changes. For a counting query ("how many users opened the app yesterday?"), the sensitivity is one. For a sum query over bounded values, the sensitivity is the bound. For more complex queries, sensitivity may be hard to compute, which is why elastic sensitivity, smooth sensitivity, and related notions exist.
If you run k independent (epsilon, delta)-DP mechanisms on the same data, the basic composition theorem says the combined mechanism satisfies (k * epsilon, k * delta)-DP. Dwork, Rothblum, and Vadhan's 2010 advanced composition theorem improves this to roughly (epsilon * sqrt(2k * log(1/delta)), k * delta + delta')-DP for adaptive choices, an important sublinear scaling. Renyi DP, concentrated DP, and Gaussian DP each provide accounting languages where composition is exactly additive in a transformed quantity, then converted back to (epsilon, delta).
The field has developed a small toolkit of building blocks that turn non-private algorithms into DP ones.
| Mechanism | Original paper | Output type | Noise distribution | Notes |
|---|---|---|---|---|
| Laplace mechanism | Dwork, McSherry, Nissim, Smith 2006 | Real-valued | Laplace(scale = sensitivity / epsilon) | Standard for ε-DP queries on numeric outputs |
| Gaussian mechanism | Dwork, Kenthapadi, McSherry, Mironov, Naor 2006 | Real-valued | Gaussian(0, sigma^2) | Satisfies (ε, δ)-DP for sigma at least sensitivity * sqrt(2 ln(1.25/δ)) / ε |
| Exponential mechanism | McSherry & Talwar 2007 | Discrete selection | Sample item with probability proportional to exp(ε * utility / (2 * sensitivity)) | Used when output is not naturally numeric |
| Sparse vector technique | Dwork, Naor, Reingold, Rothblum, Vadhan 2009; Lyu, Su, Li 2017 | Stream of yes/no answers | Threshold check on noisy queries | Pays privacy only for queries above a threshold |
| Randomised response | Warner 1965 | Binary answer | Bernoulli flip | Predates DP but is the canonical local DP mechanism |
| PATE | Papernot, Abadi, Erlingsson, Goodfellow, Talwar 2017 | Classifier outputs | Noisy aggregation of teacher votes | Used for semi-supervised learning on sensitive data |
The Laplace mechanism is the workhorse: to answer a numeric query f with sensitivity Δf under ε-DP, sample a Laplace variable L with scale Δf/ε and release f(D) + L. The Gaussian mechanism does the same with Gaussian noise and (ε, δ)-DP, which composes more nicely under the moments accountant. The exponential mechanism extends the idea to discrete choices, picking the item with the best noisily perturbed utility score, and underlies private auctions, private median selection, and private best-arm identification.
The original Dwork et al. 2006 definition assumed a trusted curator who holds raw data and releases noisy outputs. This is the central model. Local differential privacy moves noise to the user's device: each user perturbs their value before sending it to a server. The server only ever sees noisy data, but it must aggregate many noisy reports to estimate population statistics. Local DP is used by Google's RAPPOR (Erlingsson, Pihur, Korolova, CCS 2014) for Chrome telemetry and by Apple's iOS and macOS deployments since 2016.
Shuffle DP sits between the two. Each user sends a locally randomised report through a trusted shuffler that strips identifiers and permutes messages. Amplification by shuffling lets shuffle-DP get most of the utility of central DP without trusting a single curator. Bittau et al.'s 2017 ESA paper formalised the model.
| Variant | Authors | Year | Key idea |
|---|---|---|---|
| Concentrated DP (mCDP) | Dwork, Rothblum | 2016 | Privacy loss has small mean and is subgaussian |
| Zero-concentrated DP (zCDP) | Bun, Steinke | 2016 | Bounds Renyi divergence by ρ * α for all α > 1 |
| Renyi DP | Mironov | 2017 | Bounds Renyi divergence at order α; cleaner composition |
| Truncated CDP | Bun, Dwork, Rothblum, Steinke | 2018 | Robust extension of zCDP |
| f-DP / Gaussian DP | Dong, Roth, Su | 2022 (J. Royal Stat. Soc. B) | Privacy as a hypothesis-testing tradeoff function |
These frameworks are not new privacy notions in a moral sense; they are accounting languages that compose more cleanly than (ε, δ)-DP and convert back to it for reporting. Modern DP-SGD implementations almost always track privacy loss in Renyi DP or Gaussian DP and convert at the end.
The single-record neighbourhood definition naturally extends to groups: if a mechanism is (ε, δ)-DP, then for any two datasets differing in k records, the bound becomes (k * ε, k * e^((k-1)*ε) * δ). Group privacy matters when records correspond to households, families, or repeated measurements from the same individual, since one person may contribute multiple rows.
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang's 2016 CCS paper "Deep Learning with Differential Privacy" gave the field its standard recipe for training neural networks with formal privacy guarantees. The algorithm, usually called DP-SGD, adds two modifications to ordinary stochastic gradient descent.
The paper's other contribution is the moments accountant, a tighter privacy-loss tracker than basic or advanced composition. The moments accountant tracks the log moment-generating function of the privacy loss random variable, which gives substantially better epsilons for the typical Gaussian-with-subsampling setup of DP-SGD. Modern implementations have largely replaced the moments accountant with Renyi DP accounting (Mironov 2017) or the GDP-based accountant of Dong, Roth, Su 2022, which are equivalent in spirit but easier to extend.
The practical cost of DP-SGD is real but manageable. Per-example gradient computation is often 2 to 3 times more expensive than batched gradient computation, noise reduces final accuracy at meaningful epsilons, and small underrepresented subgroups suffer disproportionately. Three open libraries dominate: Opacus (Meta, PyTorch, described in the 2021 paper "Opacus: User-Friendly Differential Privacy Library in PyTorch"), TensorFlow Privacy (Google), and JAX-Privacy (Google DeepMind). DP-SGD is now standard for federated learning, sensitive-data fine-tuning of foundation models, and any setting where membership-inference attacks are a real concern.
The last decade has seen differential privacy move from theory to production.
| Year | Organisation | Application | Trust model |
|---|---|---|---|
| 2008 | OnTheMap (US Census LEHD) | Commuter origin-destination flows | Central |
| 2014 | RAPPOR for Chrome telemetry, default homepage stats | Local | |
| 2016 | Apple | QuickType, emoji prediction, Spotlight Lookup, Safari crash data | Local |
| 2017 | Microsoft | Windows telemetry, later Office Excel pivot tables | Central |
| 2017 | Uber | FLEX SQL with elastic sensitivity for internal analytics | Central |
| 2017 | Marketing analytics, audience-engagement insights | Central | |
| 2019 | Mobility reports during COVID-19 | Central | |
| 2020 | Meta (Facebook) | Movement Range Maps for COVID-19, ad-targeting tools | Central |
| 2020 | U.S. Census Bureau | 2020 Decennial Census Disclosure Avoidance System (DAS) | Central |
| 2021+ | Gboard next-word prediction with federated learning + DP | Local + central |
The U.S. Census deployment is the most consequential to date. The 2020 DAS replaces the swapping-based protection used in earlier decades with a TopDown algorithm that injects calibrated noise at every level of the geographic hierarchy. The Census Bureau announced in June 2021 a total privacy-loss budget of epsilon = 19.61 for the redistricting data product, split into 17.14 for the persons file and 2.47 for the housing-unit file, with an extremely small delta. The decision generated unusual public debate, including a lawsuit from Alabama and academic critiques about disproportionate accuracy losses for small rural and non-white populations.
Apple's deployment, described in the company's 2017 white paper "Learning with Privacy at Scale," added local DP to iOS 10 and macOS Sierra in 2016 for QuickType, emoji recommendations, Spotlight deep-link suggestions, Lookup hints in Notes, and Safari crash counts. The 2017 "Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12" paper by Tang et al. reverse-engineered the per-event epsilons (which Apple did not initially disclose) and reported values of 1, 2, and 4 across categories, with weekly resets that made the cumulative budget much larger than per-event values suggested.
Google's RAPPOR, deployed in Chrome from 2014, is the original mass-market local DP system. It collects browser-default-changing events and other low-sensitivity telemetry by having each client run a randomised-response protocol locally, then aggregates noisy reports server-side.
Differential privacy shows up across the AI stack.
Actively maintained DP libraries as of 2026:
| Library | Maintainer | Language | Primary use |
|---|---|---|---|
| Opacus | Meta | Python (PyTorch) | DP-SGD, per-sample gradients |
| TensorFlow Privacy | Python (TensorFlow) | DP-SGD, PATE, accountants | |
| JAX-Privacy | Google DeepMind | Python (JAX) | DP-SGD with fast vectorisation |
| OpenDP | Harvard / Microsoft | Rust core, Python and R bindings | General DP query release |
| Google Differential Privacy Libraries | C++, Go, Java | Aggregation primitives | |
| PipelineDP | Google / OpenMined | Python | Apache Beam DP analytics |
| Diffprivlib | IBM | Python | DP analogues of scikit-learn |
| Tumult Analytics | Tumult Labs | Python | Commercial DP analytics SDK |
| SmartNoise | Microsoft / OpenDP | Python | DP for SQL workloads |
Libraries differ in trust model (central vs local), supported queries (counts, sums, ML), accounting language (basic, RDP, GDP), and degree of formal verification. OpenDP and SmartNoise place particular emphasis on auditable privacy proofs.
The accounting story has tightened steadily over the years. The 2010 advanced composition theorem of Dwork, Rothblum, and Vadhan replaced linear k * epsilon scaling with a sqrt(k) factor, which alone made many large-scale deployments feasible. Abadi et al.'s 2016 moments accountant gave another substantial improvement for the Gaussian-with-subsampling setup of DP-SGD. Mironov's 2017 Renyi DP, Bun and Steinke's 2016 zCDP, and Dong, Roth, and Su's 2022 f-DP each push the bound closer to optimal. Modern libraries track privacy loss in one of these tighter frameworks during training and convert to (epsilon, delta) only for reporting.
Differential privacy is the strongest privacy notion in widespread use, but it is not free of problems.
Picking epsilon is genuinely hard. The 2020 Wood et al. "Differential Privacy: A Primer for a Non-Technical Audience" essay in the Vanderbilt Journal of Entertainment and Technology Law devotes significant space to the question and arrives at no universal answer. Numbers like epsilon = 1 sound stringent but allow a multiplicative shift of e in posterior odds, more permissive than many policymakers assume. Communication is harder still: explaining to a Census respondent that their record is protected because no analyst's posterior odds can shift by more than a factor of e^19.61 is a tough sell.
Auditing is partially addressed but not solved. Black-box verification that an implementation satisfies its claimed (epsilon, delta)-DP bound is hard. The Auditing DP-SGD line of work gives empirical lower bounds on actual leakage by running membership-inference attacks under known training data, but the gap between provable upper bounds and empirical lower bounds is often large.
Reconstruction attacks remain a live research area. The Census Bureau's own 2018 reconstruction experiment showed that the 2010 swapping-based DAS allowed reconstruction of underlying records to a significant fraction, which was the empirical motivation for switching to DP for 2020.
Utility for small subgroups is the most painful tradeoff. The same noise that protects a record also dominates the signal for groups smaller than roughly 1/epsilon. The 2024 PNAS paper "The 2020 US Census Differential Privacy Method Introduces Disproportionate Discrepancies for Rural and Non-White Populations" documents larger relative errors in tracts with small rural and non-white populations than under prior methods. The choice of how to spread the privacy budget across geographies is ultimately a policy decision.
Finally, the proof depends on the neighbourhood definition. A guarantee against add-or-remove neighbours says nothing directly about an attacker who already has many of the records and wants to learn about a specific one. Group privacy and user-level DP give stronger guarantees but at a corresponding cost in epsilon.
| Technique | Type of guarantee | Relationship to DP |
|---|---|---|
| k-anonymity (Sweeney 2002) | Syntactic | Weaker; vulnerable to linkage and homogeneity attacks |
| l-diversity (Machanavajjhala et al. 2007) | Syntactic | Strengthens k-anonymity but still no formal guarantee |
| Secure multi-party computation | Computational, no leakage from computation | Orthogonal; can compute DP outputs without trusting a curator |
| Fully homomorphic encryption | Computational, no leakage from computation | Orthogonal; allows computing on encrypted data |
| Federated learning | Decentralised training | Pairs naturally with DP; FL alone does not give formal guarantees |
| Trusted execution environments (SGX, TDX) | Hardware-rooted confidentiality | Often used as a substitute curator for shuffle and central DP |
The key conceptual difference is that DP makes a statistical statement about output distributions, while MPC and FHE make computational statements about what an adversary can extract from encrypted data. The two families compose: a federated learning pipeline can use MPC to aggregate model updates and DP to ensure the aggregated update itself does not leak about any single client.
DP has gradually entered regulatory discourse, though it is rarely codified.
No major regulator has yet specified a particular epsilon as the threshold for legal anonymisation, which is probably the hardest open policy question in the field.