Skip to content

CONCEPT Cited by 1 source

Federated analytics

Definition

Federated analytics is the discipline of computing aggregated, anonymised insights about a population from data that lives only on user devices — without ever centralising the per-user contributions. The system observes collective trends (model accuracy across regions, classifier true-positive rate, feature usage distribution) while the underlying per-device data stays local. It is the read-only sibling of federated learning: where federated learning ships gradient updates to be aggregated, federated analytics ships outcome metadata / counters / sufficient statistics.

Coined by Google in the canonical 2020 federated-analytics post. The newer 2026 post canonicalises the zero-trust architecture variant: federated analytics where confidentiality of contributions is enforced not just by data minimisation but by a composition of cryptographic and hardware-isolation mechanisms.

What "private" means in federated analytics

The term "private" loads three distinct guarantees that production systems have to deliver explicitly:

  1. Per-device-data confidentiality — Google never decrypts an individual contribution. Enforced via secure aggregation (cryptographic) and / or TEE (hardware).
  2. Output-privacy guarantee — even the aggregate must not leak per-individual information. Enforced via differential privacy noise applied before / during release of the aggregate.
  3. Identity-privacy — Google does not learn which devices contributed what, beyond cohort-level information.

A private federated-analytics system is one that has explicit, verifiable mechanisms for each of these — not just "we don't read the data."

Why it matters

Federated analytics is load-bearing for the class of products where:

  • The data cannot leave the device for trust / regulation reasons (e.g. health data, on-device safety classifier inputs, keyboard text in Gboard).
  • The system needs to evolve based on observed real-world behaviour (model drift, classifier failure modes, feature usage), and the only evolution signal is the aggregate.
  • The device fleet is heterogeneous — millions of devices with different data distributions, hardware constraints, and behaviour patterns. The inference about the fleet is what the team needs, not per-user data.

Without federated analytics, the team is stuck choosing between: (a) shipping blind (no production telemetry → no model improvement), or (b) centralising raw data (compromising privacy commitments).

Production deployments named on the wiki

Deployment Use case Mechanism layers
systems/pixel-recorder AI-system insights on Pixel Recorder app TEE + DP aggregation
systems/gboard Keyboard model evolution Federated learning + analytics
systems/android-safetycore On-device safety-classifier effectiveness Crypto secure-aggregation + TEE + DP (the two-layer defense)

The 2026-05-27 Google post is the canonical wiki instance of the third-generation federated-analytics architecture — the Pixel Recorder deployment is second-generation (TEE-backed DP, no cryptographic layer), and the original 2020 federated-analytics protocols are first-generation (cryptographic but multi-round, requiring devices online during the entire aggregation window — a structural barrier to widespread deployment).

Generations of secure-aggregation in federated analytics

Per the 2026-05-27 Google post:

"Google has deployed two generations of secure aggregation protocols at scale (detailed in the initial blogpost and follow-up). However, its widespread use has been limited by the complexity in its requirement that user devices remain online in multiround protocols over extended periods of time."

Generation 1 (2017-ish) — original Practical Secure Aggregation for Federated Learning on User-Held Data protocol. Multi-round, requires devices online for full aggregation window. Generation 2 — the Distributed Differential Privacy for Federated Learning follow-up. Improves on DP composition, but still multi-round. Generation 3 (2026-05-27) — lattice-based one-shot single-message protocol with client-committee key shares — the single-message primitive eliminates the always-online requirement, opening federated analytics to drive-by devices (intermittently connected, phone-in-pocket, low-power-budget). Composed with TEE + attestation for the two-layer defense architecture.

Relation to federated learning

Property Federated learning Federated analytics
What's aggregated Gradient updates / model weights Counters / outcomes / sufficient statistics
Goal Improve a global model Observe a population's behaviour
Round structure Many rounds (one per training step) One round (per analytics window)
Output Updated model Histogram / mean / distribution
Privacy mechanisms Same: secure aggregation + DP Same: secure aggregation + DP
2026-05-27 paper applies to Both (the underlying primitive is generic) This post focuses on analytics

Federated learning and federated analytics share the same underlying secure-aggregation cryptographic substrate — the 2026-05-27 protocol applies to both, but the post canonicalises the analytics direction first.

Caveats

  • Federated analytics is not anonymisation. Per-device data leaves the device only as encrypted ciphertext + key share; it never becomes anonymised plaintext on the server. "Anonymisation" in the marketing sense (drop names + IDs) is not what's happening — the math guarantee is stronger.
  • DP budget management is the operational hard part. The aggregate output is DP-noised, but every aggregation consumes from the per-user / per-cohort privacy budget. Production deployments have to account for cumulative budget consumption across analytics queries.
  • Committee composition is a trust-decomposition decision. Who serves on the committee, how often, and what threshold-of-collusion breaks confidentiality — these are not natural defaults. The 2026-05-27 post doesn't decompose them.
  • Output utility vs privacy is a calibration choice. Lower DP noise → higher-utility insights → smaller privacy budget per query / faster budget exhaustion. Operational tuning is non-trivial.

Seen in

Last updated · 542 distilled / 1,571 read