PATTERN Cited by 1 source

TEE-for-private-AI-inference¶

Pattern¶

Run large-model server-side inference for private user content inside a Trusted Execution Environment (TEE) — typically a Confidential Virtual Machine (CVM) + Confidential-Compute-mode GPU — whose binary digest is attested against a published transparency log by the client, such that session keys are only released if attestation succeeds. Compose with Oblivious HTTP + a third-party relay + anonymous credentials so that the provider cannot target a specific user to a specific host (concepts/non-targetability). Operate the service statelessly with forward-secure ephemeral keys and minimised request inputs so that a later compromise cannot recover past sessions.

The net effect: the server-side compute step happens on plaintext, but the E2EE invariant is preserved: no one except the user's device and the attested TEE ever sees the plaintext — not the provider, not the relay, not the CDN, not the operator.

Components¶

Client-side crypto + attestation verifier — mints the request, fetches the gateway HPKE key, verifies the server's attestation against the published ledger, derives the ephemeral session key.
TEE cluster — the CVM+GPU fleet running the inference binary. Remote-shell-free, containerised, hardened.
Third-party OHTTP relay — hides client IP from the provider gateway (patterns/third-party-ohttp-relay-for-unlinkability).
Anonymous credential service — authenticates the client-class without identifying the user.
Transparency log — third-party-operated append-only ledger of acceptable TEE binary digests (patterns/publish-binary-digest-ledger).
Attestation-gated session — patterns/attestation-before-session-key-release realised as RA-TLS.
Log-filtering egress — allow-listed telemetry path out of the TEE, so observability does not leak content.

When to use¶

Preserve E2EE across a server-side compute step. You have an E2EE messaging substrate and want to add AI features without breaking the invariant.
On-device inference is infeasible. Model size or latency makes device-only serving impractical, but you are unwilling to send plaintext to a normal server.
The threat model includes insiders + supply chain. You cannot rely on operational policy alone to keep plaintext away from the operator.
Users must be able to verify the guarantee. Policy-only claims are insufficient; you need mechanism + external audit.

When NOT to use¶

The workload is not privacy-sensitive. TEEs add operational complexity + attestation infrastructure + transparency + vendor-coupling; don't pay that cost for non-sensitive content.
You need broad observability on content. TEE + log-filtering deliberately limits what you can see; if your ops model depends on full content introspection, this pattern breaks it.
You cannot get multi-party independence. The pattern depends on the OHTTP relay, transparency ledger, and attestation verifier being operated by parties independent of the service operator. If you can't get that, non-targetability + verifiability degrade.
Your model + weights must remain confidential from the user. TEEs protect the user's content from the operator; they don't protect the operator's model from the user. That's a separate problem.

Canonical wiki instance¶

WhatsApp Private Processing (Meta, 2025-04-30) is the canonical instance on the wiki. Initial user-facing use: message summarisation + writing suggestions over WhatsApp's E2EE. Full component stack described in the source:

"This confidential computing infrastructure, built on top of a Trusted Execution Environment (TEE), will make it possible for people to direct AI to process their requests — like summarizing unread WhatsApp threads or getting writing suggestions — in our secure and private cloud environment."

Caveats¶

Side-channel residual risk. TEEs have a history of speculative-execution, timing, and power side-channel findings. The pattern reduces risk; it doesn't eliminate it.
Vendor trust. The CPU + GPU vendors are inside the TCB. A backdoor or microcode-level compromise of the vendor's attestation root is catastrophic.
Operational complexity. Multi-party operational coordination (relay operator, ledger operator, CDN operator) is non-trivial; poorly-run operators undermine the guarantee.
Model quality + latency trade-off. TEEs and confidential-GPU modes have historically incurred performance overhead vs. plain server-side inference.
Abuse detection tension. Standard abuse-signals rely on content access; the pattern forces abuse detection to shift to metadata + host-monitoring outside the TEE.

Seen in¶

sources/2025-04-30-meta-building-private-processing-for-ai-tools-on-whatsapp — canonical wiki instance; explicitly stacks TEE + attestation + transparency log + OHTTP + anonymous credentials + stateless processing + forward security + data minimisation to preserve E2EE across private-AI-inference.