Skip to content

PATTERN Cited by 1 source

Isolated token service

Pattern

Carve the token-authority database + signing + verification out of the primary API cluster and onto isolated hardware with a deliberately narrow code surface. The token service does only one job (mint, verify, revoke tokens) and shares no runtime, no dependencies, and no deploy cycle with the general-purpose control plane.

Motivation

Two independent pressures justify the carve-out (Fly.io):

  1. Reliability. "Far and away the most common failure mode of an outage on our platform is 'deploys are broken', and those failures are usually caused by API instability. It would not be OK if 'deploys are broken' transitively meant 'deployed apps can't use security tokens.'"
  2. Security. Keep hazmat away from complicated code"root secrets for Macaroon tokens are hazmat, and a basic rule of thumb in secure design is: keep hazmat away from complicated code." (Source: sources/2025-03-27-flyio-operationalizing-macaroons.)

Canonical implementation: tkdb

  • ~5,000 lines of Go in a single service. Everything unrelated to tokens lives elsewhere.
  • Isolated hardware in multiple regions (US, EU, AU).
  • Records encrypted with an injected secret — compromise of a host disk doesn't reveal records.
  • Narrow RPC surface (HTTP/Noise): verify, sign, revoke, revocation feed. No ad-hoc SQL, no admin endpoints.
  • Storage substrate: SQLite + LiteFS + Litestream — tiny, fast, PITR-recoverable.

When it applies

  • Token authority (Macaroons, sessions, workload-identity issuers, OAuth providers).
  • Secret stores (Fly.io's Pet Semetary / Vault-equivalents).
  • Signing authorities (code-signing CA, release attestation).

Any service that holds root-level secret material on behalf of a larger platform.

Anti-pattern it replaces

Colocating the token authority inside the primary API cluster: - Every unrelated feature's vulnerabilities become a path to the root keys. - Deploy cadence is forced to match the API's (risky combined deploys). - Scaling profile is wrong — token auth is mostly-read; the API is mostly-write; mixing their substrates hurts both.

Culture caveat

Fly.io are self-described "allergic to microservices" — and still made the carve-out:

"As an engineering culture, we're allergic to 'microservices', and we flinched a bit at the prospect of adding a specific service just to manage tokens. But it's pulled its weight, and not added really any drama at all. We have at this point a second dedicated security service (Petsem)..."

Narrow-purpose security microservices get their own exception to microservices-averse defaults.

Seen in

Last updated · 200 distilled / 1,178 read