Skip to content

CONCEPT Cited by 1 source

Rate-limited cache

A rate-limited cache is a lookup cache in front of an expensive or rate-sensitive backend, with an explicit cap on lookup rate per key (not just entry lifetime). The shape is:

  • On lookup hit: return cached value.
  • On lookup miss: if this key has been looked up recently (within a sliding window), suppress the backend call and either return the last known value, an error, or nothing. Only if the key hasn't been looked up recently do we actually call the backend.

This is distinct from a plain TTL cache:

  • A plain TTL cache caps staleness per entry. Miss ⇒ fetch, always.
  • A rate-limited cache caps lookup rate per key. Miss
  • recent-attempt ⇒ don't fetch — useful when retries from the upstream are frequent, or when the backend is fragile under concurrent requests for the same key.

Canonical instance — Fly.io JIT WireGuard gateway

A WireGuard gateway implementing JIT peer provisioning sniffs handshake-initiation packets and looks up the initiator's pubkey in a local cache, then falls through to the Fly API on miss:

"We keep a rate-limited cache in SQLite, and when we see new peers, we'll make an internal HTTP API request to fetch the matching peer information and install it." (Source: sources/2024-03-12-flyio-jit-wireguard-peers)

The rate-limit is load-bearing because WireGuard retries handshakes frequently when the responder hasn't yet ACKed — so a single new client triggers multiple handshake-initiation packets before the peer is installed, and without the rate-limit each of those would drive an API call. The cache absorbs the retry storm; one API call per new peer instead of O(retries) is the design property.

Common shapes

  • Sliding window per key. Track "last lookup timestamp per key", skip backend call if within window.
  • Short negative-entry cache. Store "I don't know yet" explicitly with a short TTL so concurrent lookups for the same unknown key don't stampede.
  • Single-flight. At most one in-flight backend fetch per key; other lookups for the same key during the fetch block on its result.

Fly.io's SQLite-backed cache combines these in practice — SQLite gives them persistence across process restarts and transactional semantics for the rate-limit window bookkeeping.

A rate-limited cache is one of the canonical anti-thundering-herd primitives. The thundering herd is the pathology: N clients miss the cache simultaneously, all fetch from the backend, backend collapses or API rate-limit trips. Rate-limited + single-flight caching bounds the fetch rate per key regardless of arrival rate.

Seen in

Last updated · 200 distilled / 1,178 read