Skip to content

CONCEPT Cited by 1 source

Authorization decision caching

What it is

Authorization decision caching stores the Allow / Deny outcome of an authorization check for later reuse, so subsequent requests with the same inputs skip the policy-engine round trip. Most production fine-grained-authz architectures depend on it — AVP / Cedar / OPA evaluation is "millisecond-level" by itself, but authorization is on the hot path of every API call; submillisecond is only reachable via caching.

Typical two-level shape

The canonical shape, used by Convera:

  1. API-Gateway-level authorizer-decision cache. Given a token, the IAM policy returned by the Lambda authorizer is cached keyed by token (or token + route) for a configurable TTL. Subsequent requests from the same principal with the same token hit the cache — no authorizer invocation, no AVP call, no Cognito validation.
  2. Application-level token cache. The client app caches the Cognito JWT itself so repeat logins don't re-hit Cognito.

Together: "sub-millisecond response times while reducing operational costs and maintaining security controls." (Source: sources/2026-02-05-aws-convera-verified-permissions-fine-grained-authorization)

The cache-key design

Cache key granularity determines what you're actually caching:

  • Token-keyed. One entry per token. Coarse. Decision changes when the token changes. Used in Convera.
  • (Token, route)-keyed. Per-endpoint decision. Finer. Needed when the same token might have different decisions on different endpoints and you want each evaluated independently.
  • (Principal, resource, action)-keyed. Finest. Requires the authorizer to extract resource identifiers from the request URL / body before looking up. Most invalidation work, most fidelity.

Cache-invalidation becomes policy-propagation floor

A subtle but load-bearing property: the cache TTL becomes the minimum time between a policy change and enforcement. If TTL = 5m and infosec just tightened a policy, it may take 5m before all live tokens see the new policy.

This interacts with concepts/token-enrichment: enriched attributes are pinned for the token lifetime, and cached decisions are pinned for the cache TTL. Layered caches → layered staleness.

What must NOT be cached

  • Decisions that depend on time-bound context. "Allow only during business hours" — the decision is wrong 10 minutes after the boundary. Either don't cache these, or include the time bucket in the cache key.
  • Decisions that depend on live counters. "Allow only under the user's daily transaction limit" — stale cache under-reports usage.
  • Decisions shortly after a known policy change. If the invalidation signal exists, flush on change; if not, accept TTL staleness.

Caveats

  • Security ≠ latency. The cache is a latency tool, not a security tool. A cache that caches "Allow" past a revocation is a security incident. TTLs must be tuned against the policy-change frequency and the severity of late propagation.
  • Cache-miss behavior. On miss, fall through to full evaluation. Sizing a cache-miss spike (e.g., after a mass token refresh) is capacity planning for the authorizer.
  • Multi-tenant isolation. Cache keys in a multi-tenant system must include the tenant dimension; cross-tenant cache hits are a leakage.

Seen in

Last updated · 200 distilled / 1,178 read