Skip to content

PATTERN Cited by 1 source

Cached Lookup with Short TTL

Shape

Cache the result of an expensive or availability-coupled lookup (DB query, RPC, DNS, auth token validation) in local in-process memory with a short TTL — seconds, not minutes. Every request first checks the cache; on hit, skip the downstream call entirely; on miss or TTL-expiry, do the lookup and repopulate.

request → cache hit?  yes → use cached value
                     no  → downstream call → cache result with TTL

The TTL is tuned so that:

  • Hot lookups hit the cache the overwhelming majority of the time (= dominant bandwidth offload + latency reduction).
  • Data freshness is acceptable — users tolerate "data up to TTL seconds stale".
  • Downstream outages shorter than TTL are invisible to cache-hit traffic.

When to use

  • Lookup is on a hot code path (per-request or near-per-request).
  • Lookup target is a soft-freshness store — the latest value matters within seconds, not milliseconds.
  • Downstream has latency variability or transient unavailability that you want to insulate callers from.
  • Cache memory cost is bounded (key cardinality × value size fits in process memory).

When NOT to use

  • Strict freshness required (auth session revocation, inventory on a transactional path).
  • Unbounded key cardinality — the cache evicts under load and you get pathological miss rates.
  • Stale reads have security / correctness implications that the TTL-bounded window can't tolerate.

Design decisions

  • TTL length — trade freshness vs. dependency-insulation window. 30 s is a common sweet spot for "absorb blips, stay fresh". Sub-second TTLs are performance caches, not availability caches. Multi-minute TTLs are usually a freshness mistake.
  • Cache scope — per-process (each nginx worker has its own) vs. per-host (shared-memory zone across workers). Per-host gives better hit ratios; per-process is simpler.
  • Miss behaviour — block all misses on the downstream call (thundering herd on TTL expiry), or serve stale + refresh async (stale-while-revalidate).
  • Negative caching — cache misses / errors too? Usually a much shorter TTL, to avoid "dependency came back but we're still serving errors".
  • Invalidation — short TTL is itself the invalidation mechanism. If you need faster, you need explicit purge — at which point re-evaluate whether this pattern fits.

Canonical wiki instance

GitHub Pages's 2015 rearchitecture. Per request, the ngx_lua router needs the hostname → fileserver mapping from MySQL. To reduce DB load and tolerate MySQL blips, routing lookups are cached in nginx's shared memory zones for 30 seconds on each pages-fe node. Per the source: "We also use ngx_lua's shared memory zones to cache routing lookups on the pages-fe node for 30 seconds to reduce load on our MySQL infrastructure and also allow us to tolerate blips a little better." Source: sources/2025-09-02-github-rearchitecting-github-pages.

The 30 s TTL is the operating point for: (a) new Pages sites publish within 30 s (tolerable vs. the pre-2015 30-minute cron regen), (b) MySQL blips under 30 s are invisible to Pages routing, (c) per-node memory footprint manageable at the routing-table cardinality.

Trade-offs vs. alternatives

  • vs. no cache — wins bandwidth + latency + outage- tolerance; costs freshness within TTL + cache memory.
  • vs. push-based invalidation — simpler (no invalidation fan-out path); loses immediate propagation.
  • vs. long-TTL cache — tighter freshness floor; less outage tolerance window.
  • vs. CDN edge cache (patterns/cdn-in-front-for-availability-fallback) — complementary, not competing. CDN handles full-origin outage; short-TTL handles transient downstream flakiness.
Last updated · 517 distilled / 1,221 read