CONCEPT Cited by 1 source
Signing key rotation lifecycle¶
Definition¶
The signing-key rotation lifecycle is the ordered discipline for replacing a cryptographic signing key without invalidating tokens in flight and without requiring any coordination with downstream verifiers. The canonical sequence is generate → publish → grace → activate → retire → drop, and each transition is gated on a specific property of the prior state rather than on a fixed wall-clock schedule.
Zalando's customer-authentication identity provider implements this as a fully automated loop over its JWKS endpoint; the article (2025-01-20) is the canonical public documentation of the shape (Source: ).
The five (really six) phases¶
From the Zalando article — direct quote of the sequence:
"First, a new key pair is generated. We then publish the public key portion of this new pair on our JWK endpoint, making it available to our clients. To avoid any immediate disruptions, we incorporate a grace period, allowing clients ample time to fetch the latest set of JWKs — cache control headers matter! After this period, the new key is being elected as the new active signing key. The previous active key is being retired, meaning it's no longer used for signing new tokens, but its public key remains available on the JWK endpoint to ensure that previously issued tokens can still be verified. Finally, once a retired key surpasses the maximum lifetime of any token it might have signed, we remove its public key from the JWK endpoint."
Restated as a state machine:
| # | Phase | Private key | Public in JWKS | Used for signing | Used for verify |
|---|---|---|---|---|---|
| 1 | Generate | exists | no | no | no |
| 2 | Publish | exists | yes | no | no |
| 3 | Grace | exists | yes | no | no |
| 4 | Activate | exists | yes | yes | yes |
| 5 | Retire | exists | yes | no | yes |
| 6 | Drop | destroy | no | no | no |
The two hard gates are between 3→4 and 5→6:
- 3→4 gate: grace period elapsed. The new public key must be visible to every client cache before it is used to sign any token. This gate is governed by the cache-control- aware grace period — not a fixed timer, but an interval long enough that any reasonable client has re-fetched JWKS at least once per its cache TTL.
- 5→6 gate: retired-key-lifetime exceeded. The retired key's public part can only be removed once no outstanding token signed by it can still be valid. This is governed by the concepts/retirement-plus-lifespan-plus-buffer-formula — retirement time + max token lifespan + safety buffer.
Why this ordering is non-negotiable¶
Any compression of the sequence breaks a verifier:
- Skip publish / grace (jump straight to activate). The IdP
starts signing with a key whose public half is not yet in
client caches. Verifiers receive a JWT with an unknown
kid, fail signature verification, and return 401. Every active session breaks until every client refetches JWKS. - Skip retire (drop immediately on deactivation). The IdP
stops signing and immediately removes the public key. Any
token signed in the last
max_token_lifespanwindow suddenly fails verification even though it hasn't expired. - Skip activate (publish and drop without using). No security benefit; the old key keeps signing.
The full six-phase lifecycle is the minimum that preserves the two verifier-facing invariants:
- Every
kida verifier sees in a token was in the JWKS before the token was signed. (Enforced by 2→3→4.) - Every
kidused to sign a still-valid token is still in the JWKS. (Enforced by 4→5→6 gate.)
Automation¶
The Zalando four principles (from the article):
- Automation — "New keys are generated and old keys are retired automatically, eliminating manual intervention and ensuring consistency."
- Scheduled Rotation — keys rotate on a regular cadence to minimise the window of vulnerability.
- Secure Key Management — private keys stored + managed per industry best practice.
- Seamless Rotation — transparent to clients, no access revocation or token invalidation during a planned rotation.
The first + fourth principles are what the lifecycle sequence
operationalises. The whole point of the ordered phases is
that a scheduled rotation is invisible to any downstream
service: kid lookups transition smoothly from the old key to
the new key on the normal cache-refresh cadence, and every
token signed before or after the transition continues to
verify.
Sibling instance: DNSSEC ZSK/KSK rotation¶
The same phase-3→phase-4 gate (new public key must be visible to every validator before it is used to sign) governs DNSSEC key rotation at the DNS-zone altitude. Two signing keys per zone:
- Zone Signing Key (ZSK) — signs record sets (RRSIGs over RRsets). Rotated more often; rotation contained within the zone.
- Key Signing Key (KSK) — signs the ZSK. The parent zone's DS record is a hash of the KSK's public half; rotating the KSK requires coordinating with the parent registry (or with the root for TLD KSKs).
From the 2026-05-06 Cloudflare
DNSSEC .de outage post:
"During a key rotation, there is a critical window where the old key is being phased out and the new one phased in. If the signatures published in the zone are made with a key that resolvers cannot verify against the zone's published DNSKEY records, whether because the signing step failed, the timing was wrong, or the new key wasn't fully distributed yet, resolvers have no choice but to reject the responses and return SERVFAIL."
The 2026-05-05 DENIC .de incident is the canonical DNSSEC
instance of a phase-3→phase-4 gate violation: DENIC began
publishing RRSIGs signed with a key whose public half was not
verifiable against the published DNSKEYs. The same invariant
violation as "skip publish/grace and jump straight to activate"
in the JWT-altitude lifecycle, with a structurally larger blast
radius because of
DNSSEC's chain of trust:
every child zone under .de became unvalidatable simultaneously.
DENIC's own post-incident note (quoted in the Cloudflare
writeup):
"The outage is linked to a routine, scheduled key rollover. During this process, non-validatable signatures were generated and distributed. As a precautionary measure, future rollovers have been suspended until the exact technical causes have been identified."
The mitigation — Negative Trust Anchor — is the DNSSEC-specific analogue of an emergency token-invalidation: accept the breakage of verification rather than leave every client stranded. The boundary-condition "emergency rotation" section below captures the same dynamic cross-protocol.
Boundary conditions¶
Emergency rotation. If a private key is suspected to be compromised, the scheduled lifecycle is the wrong shape: compromise requires immediate revocation, accepting the breakage of any outstanding tokens. Scheduled rotation is a preventive control (to minimise the window of exposure); emergency rotation is a reactive control with different tradeoffs (token invalidation is the desired outcome, not a regression).
Multiple concurrent active keys. Some IdPs rotate with overlap (two keys actively signing for a window) to support deployments where not all signing instances have yet picked up the new key. Zalando's article describes the strict single- active-key model; the lifecycle generalises to multi-active by having phases 4–5 overlap across two key generations.
See also¶
- concepts/jwk-json-web-key — the key-distribution substrate that makes phase 2 (publish) possible as an HTTP GET.
- concepts/cache-control-aware-grace-period — why phase 3 (grace) is measured in cache-TTLs, not clock-time.
- concepts/retirement-plus-lifespan-plus-buffer-formula — the formula that decides when phase 6 (drop) is safe.
- concepts/long-lived-key-risk — why shortening the window between generate and retire is a structural security property, not just hygiene.
- patterns/phased-automated-jwk-rotation — the automated system-level pattern that implements this lifecycle.
Related¶
- concepts/jwk-json-web-key
- concepts/cache-control-aware-grace-period
- concepts/retirement-plus-lifespan-plus-buffer-formula
- concepts/long-lived-key-risk
- concepts/oidc-identity-federation
- patterns/phased-automated-jwk-rotation
- systems/zalando-oidc-identity-provider
- concepts/dnssec · concepts/dnssec-chain-of-trust — DNSSEC ZSK/KSK rotation is a sibling instance at the DNS-zone altitude.
- concepts/negative-trust-anchor — the DNSSEC analogue of emergency token invalidation when rotation misfires at a TLD.
- systems/denic — canonical 2026-05-05 failed-rotation at TLD altitude.