CONCEPT Cited by 1 source
Federated analytics¶
Definition¶
Federated analytics is the discipline of computing aggregated, anonymised insights about a population from data that lives only on user devices — without ever centralising the per-user contributions. The system observes collective trends (model accuracy across regions, classifier true-positive rate, feature usage distribution) while the underlying per-device data stays local. It is the read-only sibling of federated learning: where federated learning ships gradient updates to be aggregated, federated analytics ships outcome metadata / counters / sufficient statistics.
Coined by Google in the canonical 2020 federated-analytics post. The newer 2026 post canonicalises the zero-trust architecture variant: federated analytics where confidentiality of contributions is enforced not just by data minimisation but by a composition of cryptographic and hardware-isolation mechanisms.
What "private" means in federated analytics¶
The term "private" loads three distinct guarantees that production systems have to deliver explicitly:
- Per-device-data confidentiality — Google never decrypts an individual contribution. Enforced via secure aggregation (cryptographic) and / or TEE (hardware).
- Output-privacy guarantee — even the aggregate must not leak per-individual information. Enforced via differential privacy noise applied before / during release of the aggregate.
- Identity-privacy — Google does not learn which devices contributed what, beyond cohort-level information.
A private federated-analytics system is one that has explicit, verifiable mechanisms for each of these — not just "we don't read the data."
Why it matters¶
Federated analytics is load-bearing for the class of products where:
- The data cannot leave the device for trust / regulation reasons (e.g. health data, on-device safety classifier inputs, keyboard text in Gboard).
- The system needs to evolve based on observed real-world behaviour (model drift, classifier failure modes, feature usage), and the only evolution signal is the aggregate.
- The device fleet is heterogeneous — millions of devices with different data distributions, hardware constraints, and behaviour patterns. The inference about the fleet is what the team needs, not per-user data.
Without federated analytics, the team is stuck choosing between: (a) shipping blind (no production telemetry → no model improvement), or (b) centralising raw data (compromising privacy commitments).
Production deployments named on the wiki¶
| Deployment | Use case | Mechanism layers |
|---|---|---|
| systems/pixel-recorder | AI-system insights on Pixel Recorder app | TEE + DP aggregation |
| systems/gboard | Keyboard model evolution | Federated learning + analytics |
| systems/android-safetycore | On-device safety-classifier effectiveness | Crypto secure-aggregation + TEE + DP (the two-layer defense) |
The 2026-05-27 Google post is the canonical wiki instance of the third-generation federated-analytics architecture — the Pixel Recorder deployment is second-generation (TEE-backed DP, no cryptographic layer), and the original 2020 federated-analytics protocols are first-generation (cryptographic but multi-round, requiring devices online during the entire aggregation window — a structural barrier to widespread deployment).
Generations of secure-aggregation in federated analytics¶
Per the 2026-05-27 Google post:
"Google has deployed two generations of secure aggregation protocols at scale (detailed in the initial blogpost and follow-up). However, its widespread use has been limited by the complexity in its requirement that user devices remain online in multiround protocols over extended periods of time."
Generation 1 (2017-ish) — original Practical Secure Aggregation for Federated Learning on User-Held Data protocol. Multi-round, requires devices online for full aggregation window. Generation 2 — the Distributed Differential Privacy for Federated Learning follow-up. Improves on DP composition, but still multi-round. Generation 3 (2026-05-27) — lattice-based one-shot single-message protocol with client-committee key shares — the single-message primitive eliminates the always-online requirement, opening federated analytics to drive-by devices (intermittently connected, phone-in-pocket, low-power-budget). Composed with TEE + attestation for the two-layer defense architecture.
Relation to federated learning¶
| Property | Federated learning | Federated analytics |
|---|---|---|
| What's aggregated | Gradient updates / model weights | Counters / outcomes / sufficient statistics |
| Goal | Improve a global model | Observe a population's behaviour |
| Round structure | Many rounds (one per training step) | One round (per analytics window) |
| Output | Updated model | Histogram / mean / distribution |
| Privacy mechanisms | Same: secure aggregation + DP | Same: secure aggregation + DP |
| 2026-05-27 paper applies to | Both (the underlying primitive is generic) | This post focuses on analytics |
Federated learning and federated analytics share the same underlying secure-aggregation cryptographic substrate — the 2026-05-27 protocol applies to both, but the post canonicalises the analytics direction first.
Caveats¶
- Federated analytics is not anonymisation. Per-device data leaves the device only as encrypted ciphertext + key share; it never becomes anonymised plaintext on the server. "Anonymisation" in the marketing sense (drop names + IDs) is not what's happening — the math guarantee is stronger.
- DP budget management is the operational hard part. The aggregate output is DP-noised, but every aggregation consumes from the per-user / per-cohort privacy budget. Production deployments have to account for cumulative budget consumption across analytics queries.
- Committee composition is a trust-decomposition decision. Who serves on the committee, how often, and what threshold-of-collusion breaks confidentiality — these are not natural defaults. The 2026-05-27 post doesn't decompose them.
- Output utility vs privacy is a calibration choice. Lower DP noise → higher-utility insights → smaller privacy budget per query / faster budget exhaustion. Operational tuning is non-trivial.
Seen in¶
- sources/2026-05-27-google-private-analytics-via-zero-trust-aggregation — canonical wiki instance of the third-generation federated-analytics architecture; first named production target is Android System SafetyCore.
Related¶
- concepts/secure-aggregation — the cryptographic primitive
- concepts/differential-privacy — the output-privacy guarantee
- concepts/on-device-ml-inference — the upstream classifier whose effectiveness federated analytics measures
- concepts/data-minimization — generalising privacy principle
- concepts/end-to-end-encryption — sibling guarantee at the messaging layer
- concepts/trusted-execution-environment — composed protection layer
- patterns/cryptography-plus-tee-defense-in-depth — the architectural shape
- patterns/one-shot-secure-aggregation — protocol pattern
- patterns/client-committee-key-shares — key-distribution pattern
- systems/android-safetycore
- systems/google-confidential-federated-analytics
- systems/pixel-recorder
- systems/gboard