Skip to content

PATTERN Cited by 1 source

TEE-for-private-AI-inference

Pattern

Run large-model server-side inference for private user content inside a Trusted Execution Environment (TEE) — typically a Confidential Virtual Machine (CVM) + Confidential-Compute-mode GPU — whose binary digest is attested against a published transparency log by the client, such that session keys are only released if attestation succeeds. Compose with Oblivious HTTP + a third-party relay + anonymous credentials so that the provider cannot target a specific user to a specific host (concepts/non-targetability). Operate the service statelessly with forward-secure ephemeral keys and minimised request inputs so that a later compromise cannot recover past sessions.

The net effect: the server-side compute step happens on plaintext, but the E2EE invariant is preserved: no one except the user's device and the attested TEE ever sees the plaintext — not the provider, not the relay, not the CDN, not the operator.

Components

  1. Client-side crypto + attestation verifier — mints the request, fetches the gateway HPKE key, verifies the server's attestation against the published ledger, derives the ephemeral session key.
  2. TEE cluster — the CVM+GPU fleet running the inference binary. Remote-shell-free, containerised, hardened.
  3. Third-party OHTTP relay — hides client IP from the provider gateway (patterns/third-party-ohttp-relay-for-unlinkability).
  4. Anonymous credential service — authenticates the client-class without identifying the user.
  5. Transparency log — third-party-operated append-only ledger of acceptable TEE binary digests (patterns/publish-binary-digest-ledger).
  6. Attestation-gated sessionpatterns/attestation-before-session-key-release realised as RA-TLS.
  7. Log-filtering egress — allow-listed telemetry path out of the TEE, so observability does not leak content.

When to use

  • Preserve E2EE across a server-side compute step. You have an E2EE messaging substrate and want to add AI features without breaking the invariant.
  • On-device inference is infeasible. Model size or latency makes device-only serving impractical, but you are unwilling to send plaintext to a normal server.
  • The threat model includes insiders + supply chain. You cannot rely on operational policy alone to keep plaintext away from the operator.
  • Users must be able to verify the guarantee. Policy-only claims are insufficient; you need mechanism + external audit.

When NOT to use

  • The workload is not privacy-sensitive. TEEs add operational complexity + attestation infrastructure + transparency + vendor-coupling; don't pay that cost for non-sensitive content.
  • You need broad observability on content. TEE + log-filtering deliberately limits what you can see; if your ops model depends on full content introspection, this pattern breaks it.
  • You cannot get multi-party independence. The pattern depends on the OHTTP relay, transparency ledger, and attestation verifier being operated by parties independent of the service operator. If you can't get that, non-targetability + verifiability degrade.
  • Your model + weights must remain confidential from the user. TEEs protect the user's content from the operator; they don't protect the operator's model from the user. That's a separate problem.

Canonical wiki instance

WhatsApp Private Processing (Meta, 2025-04-30) is the canonical instance on the wiki. Initial user-facing use: message summarisation + writing suggestions over WhatsApp's E2EE. Full component stack described in the source:

"This confidential computing infrastructure, built on top of a Trusted Execution Environment (TEE), will make it possible for people to direct AI to process their requests — like summarizing unread WhatsApp threads or getting writing suggestions — in our secure and private cloud environment."

Caveats

  • Side-channel residual risk. TEEs have a history of speculative-execution, timing, and power side-channel findings. The pattern reduces risk; it doesn't eliminate it.
  • Vendor trust. The CPU + GPU vendors are inside the TCB. A backdoor or microcode-level compromise of the vendor's attestation root is catastrophic.
  • Operational complexity. Multi-party operational coordination (relay operator, ledger operator, CDN operator) is non-trivial; poorly-run operators undermine the guarantee.
  • Model quality + latency trade-off. TEEs and confidential-GPU modes have historically incurred performance overhead vs. plain server-side inference.
  • Abuse detection tension. Standard abuse-signals rely on content access; the pattern forces abuse detection to shift to metadata + host-monitoring outside the TEE.

Seen in

Last updated · 319 distilled / 1,201 read