Skip to content

PATTERN Cited by 1 source

External credential store with principal rewrite

Problem

You're building a managed multi-tenant product on top of a system whose authorization is static by design — typically a configuration file loaded at pod startup, updatable only via file-rewrite + refresh-interval tick, or a boot-time RBAC list, or a compiled-in capability table. Examples: Vitess table-ACL, many MySQL / Postgres role systems, service-mesh sidecars with declarative authz configs.

Customers create and revoke credentials on demand. You cannot push per-tenant state into every data-plane pod without running into four structural failures:

  1. Refresh-interval latency — whatever refresh cadence you pick, it's incompatible with a "credentials work immediately" guarantee.
  2. Per-pod race conditions — during a rollout, the same credential can resolve to different roles on different pods.
  3. Pod-restart has no authoritative source — local-file ACL state is lost with the pod and there's no cluster-level store to reconstruct from.
  4. Per-customer state explosion — one static file per cluster can't encode per-tenant credentials; per-customer files multiply pod-local state by customer cardinality.

These four failure modes are enumerated explicitly in the PlanetScale Password Roles post.

Solution

Keep the data-plane authz invariant, and move all per-tenant credential dynamism to an ingress service backed by an external store.

  1. Fix the downstream authz config across every pod. Use a small number of synthetic principals — the PlanetScale case uses three: planetscale-reader, planetscale-writer, planetscale-admin — that cover the coarse-grained roles you expose to customers.
  2. Store per-tenant credentials in an external credential store, with schema roughly (display_name, role, password_hash_variants...). Creates / revokes / role-changes are just CRUD against this table.
  3. Terminate client connections at an ingress service (PlanetScale's user query frontend) that authenticates against the credential store.
  4. Rewrite the security principal on each connection from the customer's password-derived identity to the synthetic principal corresponding to the looked-up role.
  5. Forward to the data plane. The data plane sees only the synthetic principal and evaluates its invariant ACL / RBAC config normally.

Each of the four failure modes maps 1:1 to an architectural win: (1) → instant effect (create/revoke latency = credential-DB write), (2) → pod-independent state, (3) → restart-safe (fleet consults the external store), (4) → universal data-plane config (one file, identical everywhere, debuggable).

When to use

  • Building a managed-multi-tenant product on top of a system with file-based or otherwise-static authorization you don't control or don't want to rebuild.
  • Tenants mint credentials faster than any file-refresh cadence you could stomach.
  • You need coarse-grained roles (typically ≤ a handful) per tenant, not fine-grained per-row or per-column authz.
  • You can route all tenant traffic through an ingress — there is no path that reaches the data plane without transiting the rewrite layer.

When not to use

  • Fine-grained authz per tenant (row-level / column-level / attribute-level) can't be collapsed into a small set of synthetic principals without losing information. Use a different mechanism (e.g. pushdown authz, per-query rewrites, tenant-aware authz engines).
  • If tenants need to bring their own identity provider and expect the downstream system to see their real principals, the rewrite breaks that contract.
  • Ingress-bypass paths exist. If any client can reach the data plane directly, the downstream's static authz config doesn't protect against them.
  • Downstream audit must see customer identity. The rewrite hides the real principal from downstream logs; audit joins need credential-DB plus ingress-telemetry correlation.

Tradeoffs

  • Ingress becomes the bottleneck. The rewrite layer is a central proxy chokepoint — it has to be highly available, low-latency, and hot-path-fast.
  • Credential store becomes a hard dependency of every connection. Its availability bounds the product's.
  • Session vs per-query rewrite tradeoff. Compute the rewritten principal once at connection-open and in-flight connections survive role changes for the session's lifetime (cheap, high-throughput, but revocation-slow). Re-check on every query and revocation is instant but each query pays the credential-DB round-trip.
  • Audit indirection. Mapping "who actually did this" back to a real customer principal requires an extra join.

Seen in

Last updated · 319 distilled / 1,201 read