CONCEPT Cited by 1 source

Federated analytics¶

Definition¶

Federated analytics is the discipline of computing aggregated, anonymised insights about a population from data that lives only on user devices — without ever centralising the per-user contributions. The system observes collective trends (model accuracy across regions, classifier true-positive rate, feature usage distribution) while the underlying per-device data stays local. It is the read-only sibling of federated learning: where federated learning ships gradient updates to be aggregated, federated analytics ships outcome metadata / counters / sufficient statistics.

Coined by Google in the canonical 2020 federated-analytics post. The newer 2026 post canonicalises the zero-trust architecture variant: federated analytics where confidentiality of contributions is enforced not just by data minimisation but by a composition of cryptographic and hardware-isolation mechanisms.

What "private" means in federated analytics¶

The term "private" loads three distinct guarantees that production systems have to deliver explicitly:

Per-device-data confidentiality — Google never decrypts an individual contribution. Enforced via secure aggregation (cryptographic) and / or TEE (hardware).
Output-privacy guarantee — even the aggregate must not leak per-individual information. Enforced via differential privacy noise applied before / during release of the aggregate.
Identity-privacy — Google does not learn which devices contributed what, beyond cohort-level information.

A private federated-analytics system is one that has explicit, verifiable mechanisms for each of these — not just "we don't read the data."

Why it matters¶

Federated analytics is load-bearing for the class of products where:

The data cannot leave the device for trust / regulation reasons (e.g. health data, on-device safety classifier inputs, keyboard text in Gboard).
The system needs to evolve based on observed real-world behaviour (model drift, classifier failure modes, feature usage), and the only evolution signal is the aggregate.
The device fleet is heterogeneous — millions of devices with different data distributions, hardware constraints, and behaviour patterns. The inference about the fleet is what the team needs, not per-user data.

Without federated analytics, the team is stuck choosing between: (a) shipping blind (no production telemetry → no model improvement), or (b) centralising raw data (compromising privacy commitments).

Production deployments named on the wiki¶

Deployment	Use case	Mechanism layers
systems/pixel-recorder	AI-system insights on Pixel Recorder app	TEE + DP aggregation
systems/gboard	Keyboard model evolution	Federated learning + analytics
systems/android-safetycore	On-device safety-classifier effectiveness	Crypto secure-aggregation + TEE + DP (the two-layer defense)

The 2026-05-27 Google post is the canonical wiki instance of the third-generation federated-analytics architecture — the Pixel Recorder deployment is second-generation (TEE-backed DP, no cryptographic layer), and the original 2020 federated-analytics protocols are first-generation (cryptographic but multi-round, requiring devices online during the entire aggregation window — a structural barrier to widespread deployment).

Generations of secure-aggregation in federated analytics¶

Per the 2026-05-27 Google post:

"Google has deployed two generations of secure aggregation protocols at scale (detailed in the initial blogpost and follow-up). However, its widespread use has been limited by the complexity in its requirement that user devices remain online in multiround protocols over extended periods of time."

Generation 1 (2017-ish) — original Practical Secure Aggregation for Federated Learning on User-Held Data protocol. Multi-round, requires devices online for full aggregation window. Generation 2 — the Distributed Differential Privacy for Federated Learning follow-up. Improves on DP composition, but still multi-round. Generation 3 (2026-05-27) — lattice-based one-shot single-message protocol with client-committee key shares — the single-message primitive eliminates the always-online requirement, opening federated analytics to drive-by devices (intermittently connected, phone-in-pocket, low-power-budget). Composed with TEE + attestation for the two-layer defense architecture.

Relation to federated learning¶

Property	Federated learning	Federated analytics
What's aggregated	Gradient updates / model weights	Counters / outcomes / sufficient statistics
Goal	Improve a global model	Observe a population's behaviour
Round structure	Many rounds (one per training step)	One round (per analytics window)
Output	Updated model	Histogram / mean / distribution
Privacy mechanisms	Same: secure aggregation + DP	Same: secure aggregation + DP
2026-05-27 paper applies to	Both (the underlying primitive is generic)	This post focuses on analytics

Federated learning and federated analytics share the same underlying secure-aggregation cryptographic substrate — the 2026-05-27 protocol applies to both, but the post canonicalises the analytics direction first.

Caveats¶

Federated analytics is not anonymisation. Per-device data leaves the device only as encrypted ciphertext + key share; it never becomes anonymised plaintext on the server. "Anonymisation" in the marketing sense (drop names + IDs) is not what's happening — the math guarantee is stronger.
DP budget management is the operational hard part. The aggregate output is DP-noised, but every aggregation consumes from the per-user / per-cohort privacy budget. Production deployments have to account for cumulative budget consumption across analytics queries.
Committee composition is a trust-decomposition decision. Who serves on the committee, how often, and what threshold-of-collusion breaks confidentiality — these are not natural defaults. The 2026-05-27 post doesn't decompose them.
Output utility vs privacy is a calibration choice. Lower DP noise → higher-utility insights → smaller privacy budget per query / faster budget exhaustion. Operational tuning is non-trivial.

Seen in¶

sources/2026-05-27-google-private-analytics-via-zero-trust-aggregation — canonical wiki instance of the third-generation federated-analytics architecture; first named production target is Android System SafetyCore.

concepts/secure-aggregation — the cryptographic primitive
concepts/differential-privacy — the output-privacy guarantee
concepts/on-device-ml-inference — the upstream classifier whose effectiveness federated analytics measures
concepts/data-minimization — generalising privacy principle
concepts/end-to-end-encryption — sibling guarantee at the messaging layer
concepts/trusted-execution-environment — composed protection layer
patterns/cryptography-plus-tee-defense-in-depth — the architectural shape
patterns/one-shot-secure-aggregation — protocol pattern
patterns/client-committee-key-shares — key-distribution pattern
systems/android-safetycore
systems/google-confidential-federated-analytics
systems/pixel-recorder
systems/gboard