Skip to content

CONCEPT Cited by 1 source

Signing key rotation lifecycle

Definition

The signing-key rotation lifecycle is the ordered discipline for replacing a cryptographic signing key without invalidating tokens in flight and without requiring any coordination with downstream verifiers. The canonical sequence is generate → publish → grace → activate → retire → drop, and each transition is gated on a specific property of the prior state rather than on a fixed wall-clock schedule.

Zalando's customer-authentication identity provider implements this as a fully automated loop over its JWKS endpoint; the article (2025-01-20) is the canonical public documentation of the shape (Source: sources/2025-01-20-zalando-json-web-keys-jwk-rotating-cryptographic-keys-at-zalando).

The five (really six) phases

From the Zalando article — direct quote of the sequence:

"First, a new key pair is generated. We then publish the public key portion of this new pair on our JWK endpoint, making it available to our clients. To avoid any immediate disruptions, we incorporate a grace period, allowing clients ample time to fetch the latest set of JWKs — cache control headers matter! After this period, the new key is being elected as the new active signing key. The previous active key is being retired, meaning it's no longer used for signing new tokens, but its public key remains available on the JWK endpoint to ensure that previously issued tokens can still be verified. Finally, once a retired key surpasses the maximum lifetime of any token it might have signed, we remove its public key from the JWK endpoint."

Restated as a state machine:

# Phase Private key Public in JWKS Used for signing Used for verify
1 Generate exists no no no
2 Publish exists yes no no
3 Grace exists yes no no
4 Activate exists yes yes yes
5 Retire exists yes no yes
6 Drop destroy no no no

The two hard gates are between 3→4 and 5→6:

  • 3→4 gate: grace period elapsed. The new public key must be visible to every client cache before it is used to sign any token. This gate is governed by the cache-control- aware grace period — not a fixed timer, but an interval long enough that any reasonable client has re-fetched JWKS at least once per its cache TTL.
  • 5→6 gate: retired-key-lifetime exceeded. The retired key's public part can only be removed once no outstanding token signed by it can still be valid. This is governed by the concepts/retirement-plus-lifespan-plus-buffer-formula — retirement time + max token lifespan + safety buffer.

Why this ordering is non-negotiable

Any compression of the sequence breaks a verifier:

  • Skip publish / grace (jump straight to activate). The IdP starts signing with a key whose public half is not yet in client caches. Verifiers receive a JWT with an unknown kid, fail signature verification, and return 401. Every active session breaks until every client refetches JWKS.
  • Skip retire (drop immediately on deactivation). The IdP stops signing and immediately removes the public key. Any token signed in the last max_token_lifespan window suddenly fails verification even though it hasn't expired.
  • Skip activate (publish and drop without using). No security benefit; the old key keeps signing.

The full six-phase lifecycle is the minimum that preserves the two verifier-facing invariants:

  1. Every kid a verifier sees in a token was in the JWKS before the token was signed. (Enforced by 2→3→4.)
  2. Every kid used to sign a still-valid token is still in the JWKS. (Enforced by 4→5→6 gate.)

Automation

The Zalando four principles (from the article):

  1. Automation"New keys are generated and old keys are retired automatically, eliminating manual intervention and ensuring consistency."
  2. Scheduled Rotation — keys rotate on a regular cadence to minimise the window of vulnerability.
  3. Secure Key Management — private keys stored + managed per industry best practice.
  4. Seamless Rotation — transparent to clients, no access revocation or token invalidation during a planned rotation.

The first + fourth principles are what the lifecycle sequence operationalises. The whole point of the ordered phases is that a scheduled rotation is invisible to any downstream service: kid lookups transition smoothly from the old key to the new key on the normal cache-refresh cadence, and every token signed before or after the transition continues to verify.

Boundary conditions

Emergency rotation. If a private key is suspected to be compromised, the scheduled lifecycle is the wrong shape: compromise requires immediate revocation, accepting the breakage of any outstanding tokens. Scheduled rotation is a preventive control (to minimise the window of exposure); emergency rotation is a reactive control with different tradeoffs (token invalidation is the desired outcome, not a regression).

Multiple concurrent active keys. Some IdPs rotate with overlap (two keys actively signing for a window) to support deployments where not all signing instances have yet picked up the new key. Zalando's article describes the strict single- active-key model; the lifecycle generalises to multi-active by having phases 4–5 overlap across two key generations.

See also

Last updated · 501 distilled / 1,218 read