CONCEPT Cited by 1 source
Signing key rotation lifecycle¶
Definition¶
The signing-key rotation lifecycle is the ordered discipline for replacing a cryptographic signing key without invalidating tokens in flight and without requiring any coordination with downstream verifiers. The canonical sequence is generate → publish → grace → activate → retire → drop, and each transition is gated on a specific property of the prior state rather than on a fixed wall-clock schedule.
Zalando's customer-authentication identity provider implements this as a fully automated loop over its JWKS endpoint; the article (2025-01-20) is the canonical public documentation of the shape (Source: sources/2025-01-20-zalando-json-web-keys-jwk-rotating-cryptographic-keys-at-zalando).
The five (really six) phases¶
From the Zalando article — direct quote of the sequence:
"First, a new key pair is generated. We then publish the public key portion of this new pair on our JWK endpoint, making it available to our clients. To avoid any immediate disruptions, we incorporate a grace period, allowing clients ample time to fetch the latest set of JWKs — cache control headers matter! After this period, the new key is being elected as the new active signing key. The previous active key is being retired, meaning it's no longer used for signing new tokens, but its public key remains available on the JWK endpoint to ensure that previously issued tokens can still be verified. Finally, once a retired key surpasses the maximum lifetime of any token it might have signed, we remove its public key from the JWK endpoint."
Restated as a state machine:
| # | Phase | Private key | Public in JWKS | Used for signing | Used for verify |
|---|---|---|---|---|---|
| 1 | Generate | exists | no | no | no |
| 2 | Publish | exists | yes | no | no |
| 3 | Grace | exists | yes | no | no |
| 4 | Activate | exists | yes | yes | yes |
| 5 | Retire | exists | yes | no | yes |
| 6 | Drop | destroy | no | no | no |
The two hard gates are between 3→4 and 5→6:
- 3→4 gate: grace period elapsed. The new public key must be visible to every client cache before it is used to sign any token. This gate is governed by the cache-control- aware grace period — not a fixed timer, but an interval long enough that any reasonable client has re-fetched JWKS at least once per its cache TTL.
- 5→6 gate: retired-key-lifetime exceeded. The retired key's public part can only be removed once no outstanding token signed by it can still be valid. This is governed by the concepts/retirement-plus-lifespan-plus-buffer-formula — retirement time + max token lifespan + safety buffer.
Why this ordering is non-negotiable¶
Any compression of the sequence breaks a verifier:
- Skip publish / grace (jump straight to activate). The IdP
starts signing with a key whose public half is not yet in
client caches. Verifiers receive a JWT with an unknown
kid, fail signature verification, and return 401. Every active session breaks until every client refetches JWKS. - Skip retire (drop immediately on deactivation). The IdP
stops signing and immediately removes the public key. Any
token signed in the last
max_token_lifespanwindow suddenly fails verification even though it hasn't expired. - Skip activate (publish and drop without using). No security benefit; the old key keeps signing.
The full six-phase lifecycle is the minimum that preserves the two verifier-facing invariants:
- Every
kida verifier sees in a token was in the JWKS before the token was signed. (Enforced by 2→3→4.) - Every
kidused to sign a still-valid token is still in the JWKS. (Enforced by 4→5→6 gate.)
Automation¶
The Zalando four principles (from the article):
- Automation — "New keys are generated and old keys are retired automatically, eliminating manual intervention and ensuring consistency."
- Scheduled Rotation — keys rotate on a regular cadence to minimise the window of vulnerability.
- Secure Key Management — private keys stored + managed per industry best practice.
- Seamless Rotation — transparent to clients, no access revocation or token invalidation during a planned rotation.
The first + fourth principles are what the lifecycle sequence
operationalises. The whole point of the ordered phases is
that a scheduled rotation is invisible to any downstream
service: kid lookups transition smoothly from the old key to
the new key on the normal cache-refresh cadence, and every
token signed before or after the transition continues to
verify.
Boundary conditions¶
Emergency rotation. If a private key is suspected to be compromised, the scheduled lifecycle is the wrong shape: compromise requires immediate revocation, accepting the breakage of any outstanding tokens. Scheduled rotation is a preventive control (to minimise the window of exposure); emergency rotation is a reactive control with different tradeoffs (token invalidation is the desired outcome, not a regression).
Multiple concurrent active keys. Some IdPs rotate with overlap (two keys actively signing for a window) to support deployments where not all signing instances have yet picked up the new key. Zalando's article describes the strict single- active-key model; the lifecycle generalises to multi-active by having phases 4–5 overlap across two key generations.
See also¶
- concepts/jwk-json-web-key — the key-distribution substrate that makes phase 2 (publish) possible as an HTTP GET.
- concepts/cache-control-aware-grace-period — why phase 3 (grace) is measured in cache-TTLs, not clock-time.
- concepts/retirement-plus-lifespan-plus-buffer-formula — the formula that decides when phase 6 (drop) is safe.
- concepts/long-lived-key-risk — why shortening the window between generate and retire is a structural security property, not just hygiene.
- patterns/phased-automated-jwk-rotation — the automated system-level pattern that implements this lifecycle.