PATTERN Cited by 1 source
TEE-for-private-AI-inference¶
Pattern¶
Run large-model server-side inference for private user content inside a Trusted Execution Environment (TEE) — typically a Confidential Virtual Machine (CVM) + Confidential-Compute-mode GPU — whose binary digest is attested against a published transparency log by the client, such that session keys are only released if attestation succeeds. Compose with Oblivious HTTP + a third-party relay + anonymous credentials so that the provider cannot target a specific user to a specific host (concepts/non-targetability). Operate the service statelessly with forward-secure ephemeral keys and minimised request inputs so that a later compromise cannot recover past sessions.
The net effect: the server-side compute step happens on plaintext, but the E2EE invariant is preserved: no one except the user's device and the attested TEE ever sees the plaintext — not the provider, not the relay, not the CDN, not the operator.
Components¶
- Client-side crypto + attestation verifier — mints the request, fetches the gateway HPKE key, verifies the server's attestation against the published ledger, derives the ephemeral session key.
- TEE cluster — the CVM+GPU fleet running the inference binary. Remote-shell-free, containerised, hardened.
- Third-party OHTTP relay — hides client IP from the provider gateway (patterns/third-party-ohttp-relay-for-unlinkability).
- Anonymous credential service — authenticates the client-class without identifying the user.
- Transparency log — third-party-operated append-only ledger of acceptable TEE binary digests (patterns/publish-binary-digest-ledger).
- Attestation-gated session — patterns/attestation-before-session-key-release realised as RA-TLS.
- Log-filtering egress — allow-listed telemetry path out of the TEE, so observability does not leak content.
When to use¶
- Preserve E2EE across a server-side compute step. You have an E2EE messaging substrate and want to add AI features without breaking the invariant.
- On-device inference is infeasible. Model size or latency makes device-only serving impractical, but you are unwilling to send plaintext to a normal server.
- The threat model includes insiders + supply chain. You cannot rely on operational policy alone to keep plaintext away from the operator.
- Users must be able to verify the guarantee. Policy-only claims are insufficient; you need mechanism + external audit.
When NOT to use¶
- The workload is not privacy-sensitive. TEEs add operational complexity + attestation infrastructure + transparency + vendor-coupling; don't pay that cost for non-sensitive content.
- You need broad observability on content. TEE + log-filtering deliberately limits what you can see; if your ops model depends on full content introspection, this pattern breaks it.
- You cannot get multi-party independence. The pattern depends on the OHTTP relay, transparency ledger, and attestation verifier being operated by parties independent of the service operator. If you can't get that, non-targetability + verifiability degrade.
- Your model + weights must remain confidential from the user. TEEs protect the user's content from the operator; they don't protect the operator's model from the user. That's a separate problem.
Canonical wiki instance¶
WhatsApp Private Processing (Meta, 2025-04-30) is the canonical instance on the wiki. Initial user-facing use: message summarisation + writing suggestions over WhatsApp's E2EE. Full component stack described in the source:
"This confidential computing infrastructure, built on top of a Trusted Execution Environment (TEE), will make it possible for people to direct AI to process their requests — like summarizing unread WhatsApp threads or getting writing suggestions — in our secure and private cloud environment."
Caveats¶
- Side-channel residual risk. TEEs have a history of speculative-execution, timing, and power side-channel findings. The pattern reduces risk; it doesn't eliminate it.
- Vendor trust. The CPU + GPU vendors are inside the TCB. A backdoor or microcode-level compromise of the vendor's attestation root is catastrophic.
- Operational complexity. Multi-party operational coordination (relay operator, ledger operator, CDN operator) is non-trivial; poorly-run operators undermine the guarantee.
- Model quality + latency trade-off. TEEs and confidential-GPU modes have historically incurred performance overhead vs. plain server-side inference.
- Abuse detection tension. Standard abuse-signals rely on content access; the pattern forces abuse detection to shift to metadata + host-monitoring outside the TEE.
Seen in¶
- sources/2025-04-30-meta-building-private-processing-for-ai-tools-on-whatsapp — canonical wiki instance; explicitly stacks TEE + attestation + transparency log + OHTTP + anonymous credentials + stateless processing + forward security + data minimisation to preserve E2EE across private-AI-inference.
Related¶
- concepts/trusted-execution-environment, concepts/confidential-computing, concepts/remote-attestation, concepts/ra-tls, concepts/oblivious-http, concepts/anonymous-credential, concepts/non-targetability, concepts/stateless-processing, concepts/forward-security, concepts/verifiable-transparency-log, concepts/data-minimization, concepts/end-to-end-encryption.
- systems/cvm-confidential-virtual-machine, systems/whatsapp-private-processing.
- patterns/third-party-ohttp-relay-for-unlinkability, patterns/attestation-before-session-key-release, patterns/publish-binary-digest-ledger.