Skip to content

CONCEPT Cited by 1 source

Retirement + lifespan + buffer formula

Definition

The retirement + lifespan + buffer formula is the simple arithmetic rule that decides when a retired signing key's public half can be safely dropped from the JWKS endpoint: the drop time equals the time the key was retired (i.e. last used to sign), plus the maximum lifespan of any token it could have signed, plus a safety buffer.

drop_time  =  retirement_time  +  max_token_lifespan  +  safety_buffer

Zalando documents this explicitly in the 2025-01-20 article (Source: sources/2025-01-20-zalando-json-web-keys-jwk-rotating-cryptographic-keys-at-zalando):

"Finally, once a retired key surpasses the maximum lifetime of any token it might have signed, we remove its public key from the JWK endpoint. To determine when it's safe to remove a key, we need to know which key signed which token and how long those tokens are valid. Our JWTs include a key ID that tells us exactly which key was used to create them. We also control how long each token lasts before it expires. With this information, we can easily calculate when a key can be safely deleted. We simply take the time the key was retired, add the maximum token lifespan, and add a little extra time just to be safe. At that point, any token signed with that key will have expired, so it's safe to remove the key from our public list."

The formula is the symmetric gate to the cache-control-aware grace period on the other end of the rotation lifecycle — grace protects publish → activate, the formula protects retire → drop.

Why each term

retirement_time — when the key stopped signing

The moment the IdP elected a new active key and the old key entered retire phase. From this instant forward, no new JWT can be signed with this key; only previously-issued tokens remain outstanding.

max_token_lifespan — the longest any outstanding token can live

"We also control how long each token lasts before it expires." This is the publisher-controlled bound: the IdP sets exp - iat on issuance, so it knows the ceiling. If the IdP issues access tokens that live at most 1 hour and refresh tokens that live at most 30 days, and the key signed both, the relevant lifespan is the maximum (30 days) — because the formula must protect the longest-lived possible token.

For practical rotation: this term dictates why JWT access tokens are typically short-lived (minutes to hours) rather than days — a long max_token_lifespan forces a correspondingly long retirement hold, extending the period during which the retired key remains a possible compromise vector.

safety_buffer — slack for clock skew, clock drift, in-flight requests

"...and add a little extra time just to be safe." The buffer absorbs:

  • Clock skew between the IdP and verifiers — a token signed at retirement_time - 1s with exp = iat + max_token_lifespan might be validated at a verifier whose clock is ahead, shortening the effective lifespan from the verifier's perspective but not extending the issuer's retention obligation.
  • In-flight network delays — a token issued at retirement_time - 1s might arrive at a verifier seconds later, but the formula already accounts for this since the token's exp is the binding deadline.
  • Implementation variance — refresh-token logic, request retries, offline-then-sync clients all produce edge cases where a token signed just before retirement might be presented at an unusual time.

The buffer is the "if you're not sure, wait another hour" parameter. Typically small relative to max_token_lifespan.

Why this is trivially computable at Zalando scale

Two design choices make the formula a calculation, not a measurement:

  1. JWTs carry kid claims. Every JWT header names the key that signed it, so the IdP (and any verifier) can answer "which key signed this" without inspecting state.
  2. The IdP controls token lifespans. exp - iat is set at issuance; there is no token floating around longer than the IdP's own issuance policy allows.

Combined, these make "when is it safe to drop key K?" a pure function of the IdP's own configuration, computable at retirement_time without polling verifiers, without statistics on token usage, without coordination. "We can easily calculate."

What this formula is not

Not a risk-based decision. The formula is not about whether the retired key is likely to still verify valid tokens — it is about whether it is possible. Even if the IdP is confident that no outstanding tokens remain in practice, dropping the key before drop_time risks orphaning any token that does (stale client, replay attack, unusual client behaviour).

Not a rotation-cadence calculation. The formula governs when a retired key can be dropped; it does not govern how often rotation happens. Those are independent axes: a wiki example might rotate weekly but hold retired keys for 30 days (max token lifespan = 30-day refresh token + safety), meaning several generations of retired keys are always live in the JWKS.

Not sufficient without the complementary grace period. Both gates are required — publish-to-activate grace on one side, retire-to-drop formula on the other. Skipping either breaks the lifecycle (see concepts/signing-key-rotation-lifecycle).

Consequences for JWKS endpoint design

Because retired keys must remain published until drop_time, the JWKS endpoint at any moment advertises:

  • The active signing key.
  • The new key in its grace window (between publish and activate).
  • Any retired keys still within their retention window.

For a deployment rotating weekly with 30-day max token lifespan, this can mean 4-5 keys live in the JWKS steady-state. This is normal and expected — client libraries iterate the set and match on kid, so the cardinality doesn't affect verification latency.

JWKS size bound. rotation_cadence × retention_window ≈ steady-state JWKS size. With rotation every week and 30-day retention, ~5-6 keys; with rotation every day and 30-day retention, ~30 keys. Client libraries tolerate this range trivially; order-of-magnitude larger sets (e.g. rotating every hour for 30 days) would start to matter.

Generalisation

The formula shape — "how long do I need to keep this thing around" = "when did I stop using it" + "max lifetime of things that depend on it" + "slack" — generalises to any rotating-dependency problem:

  • Revoked TLS certs — OCSP / CRL retention until every possible relying-party cache has expired.
  • Feature flags — how long to keep a deprecated flag code path until every client on an old version is upgraded past the flag.
  • Schema versions — how long backward-compatibility code must remain until every stored artifact has been re-written to the new schema.

Each case maps to the same three terms: when did I stop producing new items using the old thing, how long can any produced item live, how much slack do I want.

Seen in

See also

Last updated · 501 distilled / 1,218 read