Skip to content

CONCEPT Cited by 1 source

Differential privacy

Definition

Differential privacy (DP) is a mathematical guarantee that the output of a computation is statistically insensitive to whether any single individual's data was included or not. Formally:

A randomised algorithm M is (ε, δ)-differentially private if, for any two datasets D and D' differing in exactly one record, and for any output S:

Pr[M(D) ∈ S] ≤ eᵉ · Pr[M(D') ∈ S] + δ

In plain terms: an observer of the output cannot tell, with confidence greater than the bound, whether any specific individual's record was in the input — even with arbitrary auxiliary knowledge.

Coined by Cynthia Dwork (2006); the foundation of statistically-rigorous privacy guarantees, distinct from k-anonymity / l-diversity / t-closeness which are heuristic.

Mechanism — noise injection

DP is delivered operationally by adding calibrated random noise to the output of a query / aggregate, where the noise scale is calibrated to the sensitivity of the query (how much the output can change from one record being added or removed). Two common mechanisms:

  • Laplace mechanism: add Laplace noise with scale Δ/ε where Δ is the query sensitivity.
  • Gaussian mechanism: add Gaussian noise; gives (ε, δ)-DP rather than pure ε-DP, but composes better in practice for ML workloads.

For aggregation in federated analytics, DP noise is added before the aggregate is released — typically composed with the secure-aggregation protocol so that the noise itself is computed in a distributed way (no single party knows the exact noise drawn).

Privacy budget

ε (epsilon) is the privacy budget — smaller ε = stronger privacy = more noise. Production deployments must manage cumulative budget consumption across queries:

  • Each query consumes some of the budget.
  • Composition theorems bound how privacy decays under multiple queries.
  • Once the budget for a record is exhausted, no further query can include that record (or the privacy guarantee no longer holds).

This is why production DP systems are engineering hard — naive query patterns burn the budget quickly.

Where it fits in the federated-analytics defense stack

In the 2026-05-27 Google zero-trust-aggregation architecture, DP plays a specific, output-side role:

  1. Per-device data is encrypted with secure aggregation — server can't decrypt individuals.
  2. TEE adds attested-binary transparency.
  3. DP noise is added at the unmasking step — when the committee reveals key hints to unlock the aggregate, the aggregate is already masked with DP noise.
  4. Server obtains: Σxᵢ + noise, with ε-bounded leakage about any xᵢ.

So DP is the last line of defense — even if cryptographic and TEE layers both fail and the aggregate becomes visible, individual contributions are still ε-bounded protected.

Why DP composes with secure aggregation, not replaces it

Secure aggregation prevents the server from ever decrypting any individual contribution. DP prevents the aggregate from leaking information about individuals. They protect against different attacks:

Attack Mitigated by
Server reads a single contribution Secure aggregation
Server reads only the sum, but population is small / one client dominates DP
Adversary observes aggregate over time and triangulates DP (with budget management)
Server decrypts the sum but can't reconstruct individuals Secure aggregation prevents nothing here — DP is the only mitigation

A federated-analytics system without DP is vulnerable to "trivially recoverable individuals from the aggregate" attacks (e.g. when a cohort contains only 2 users). A system without secure aggregation is vulnerable to a compromised server. Both are needed; neither replaces the other.

Internal DP vs external DP

  • External DP — noise is added at the central aggregator after decryption. Requires trusting the aggregator with the un-noised aggregate (it must add noise faithfully).
  • Internal DP ("distributed DP") — noise is added inside the cryptographic protocol by clients themselves (each adding a fraction of noise to their submission). Server never sees the un-noised aggregate. Stronger guarantee.

Google's secure-aggregation protocols have evolved toward internal DP — the Distributed Differential Privacy for Federated Learning work canonicalises this. The 2026-05-27 protocol composes internal DP with the new lattice-based one-shot aggregation.

Production deployments named on the wiki

Deployment DP composition
systems/pixel-recorder TEE + DP-aggregation for AI insights
systems/android-safetycore Crypto secure-aggregation + TEE + DP for classifier-effectiveness metrics

Caveats

  • DP is a guarantee on the algorithm, not on individuals. Once ε budget is exhausted, the guarantee no longer holds for new queries. Budget management is the operational hard part.
  • ε / δ values are policy decisions, not technical defaults. Industry practice ranges from ε ≈ 0.1 (very strong) to ε ≈ 10 (very weak); there is no single right number. The 2026-05-27 post does not disclose the values used.
  • DP does not fix model bias or fairness issues. It bounds individual influence on the aggregate — orthogonal to whether the aggregate itself encodes demographic bias.
  • DP and utility are in direct tension. Stronger ε → more noise → noisier insights. Federated-analytics tuning is a privacy-vs-utility Pareto optimisation.

Seen in

Last updated · 542 distilled / 1,571 read