CONCEPT Cited by 1 source
Authorization decision caching¶
What it is¶
Authorization decision caching stores the Allow / Deny outcome of an authorization check for later reuse, so subsequent requests with the same inputs skip the policy-engine round trip. Most production fine-grained-authz architectures depend on it — AVP / Cedar / OPA evaluation is "millisecond-level" by itself, but authorization is on the hot path of every API call; submillisecond is only reachable via caching.
Typical two-level shape¶
The canonical shape, used by Convera:
- API-Gateway-level authorizer-decision cache. Given a token, the IAM policy returned by the Lambda authorizer is cached keyed by token (or token + route) for a configurable TTL. Subsequent requests from the same principal with the same token hit the cache — no authorizer invocation, no AVP call, no Cognito validation.
- Application-level token cache. The client app caches the Cognito JWT itself so repeat logins don't re-hit Cognito.
Together: "sub-millisecond response times while reducing operational costs and maintaining security controls." (Source: sources/2026-02-05-aws-convera-verified-permissions-fine-grained-authorization)
The cache-key design¶
Cache key granularity determines what you're actually caching:
- Token-keyed. One entry per token. Coarse. Decision changes when the token changes. Used in Convera.
- (Token, route)-keyed. Per-endpoint decision. Finer. Needed when the same token might have different decisions on different endpoints and you want each evaluated independently.
- (Principal, resource, action)-keyed. Finest. Requires the authorizer to extract resource identifiers from the request URL / body before looking up. Most invalidation work, most fidelity.
Cache-invalidation becomes policy-propagation floor¶
A subtle but load-bearing property: the cache TTL becomes the minimum time between a policy change and enforcement. If TTL = 5m and infosec just tightened a policy, it may take 5m before all live tokens see the new policy.
This interacts with concepts/token-enrichment: enriched attributes are pinned for the token lifetime, and cached decisions are pinned for the cache TTL. Layered caches → layered staleness.
What must NOT be cached¶
- Decisions that depend on time-bound context. "Allow only during business hours" — the decision is wrong 10 minutes after the boundary. Either don't cache these, or include the time bucket in the cache key.
- Decisions that depend on live counters. "Allow only under the user's daily transaction limit" — stale cache under-reports usage.
- Decisions shortly after a known policy change. If the invalidation signal exists, flush on change; if not, accept TTL staleness.
Caveats¶
- Security ≠ latency. The cache is a latency tool, not a security tool. A cache that caches "Allow" past a revocation is a security incident. TTLs must be tuned against the policy-change frequency and the severity of late propagation.
- Cache-miss behavior. On miss, fall through to full evaluation. Sizing a cache-miss spike (e.g., after a mass token refresh) is capacity planning for the authorizer.
- Multi-tenant isolation. Cache keys in a multi-tenant system must include the tenant dimension; cross-tenant cache hits are a leakage.
Seen in¶
- sources/2026-02-05-aws-convera-verified-permissions-fine-grained-authorization — Convera's two-level cache (API Gateway authorizer-decision + application-level Cognito token) delivers submillisecond latency over AVP's millisecond-level decisions.
Related¶
- concepts/fine-grained-authorization — the per-request overhead caching exists to absorb.
- concepts/token-enrichment — the complementary hot-path optimization; both push cost off the per-request path.
- systems/amazon-api-gateway — built-in authorizer-decision cache.
- patterns/lambda-authorizer — the compute the cache fronts.