CONCEPT Cited by 2 sources
DNS resolver caching¶
Recursive DNS resolvers (such as Unbound) cache the results of DNS queries with TTL-bounded lifetimes so repeated lookups for the same name don't hit the upstream on every call. Caching applies both to successful answers (positive caching) and to NXDOMAIN / SERVFAIL-style failures (negative caching, usually with shorter TTLs) to avoid repeatedly hammering a nameserver that just failed.
Design choices¶
- Per-host local resolver. Running a caching resolver on every application host (instead of only on central DNS servers) provides a first-line cache that filters out the repeated-same-name load before it reaches the shared infrastructure. Canonical Stripe deployment shape from the 2024-12-12 source.
- Cache-size and TTL tuning. TTLs come from the
authoritative nameserver's response; resolvers can also
apply
min-ttl/max-ttlto force refresh cadence, andcache-max-negative-ttlto cap how long failures stay cached. - Cache effectiveness is workload-dependent. A reverse-DNS-heavy workload over a large IP set (e.g. every unique IP in a day's web-access logs) is effectively cache-cold — each lookup is a different name, so the cache hit rate stays near zero no matter the cache size.
Serve-stale (RFC 8767)¶
RFC 8767 introduces a third cache state between fresh and expired: stale but servable. When the TTL has elapsed and the upstream fetch fails (timeout, SERVFAIL, unverifiable DNSSEC signatures), the resolver may continue serving the expired record rather than returning an error. The client gets a bounded-stale answer with zero extra latency; the resolver retries upstream in the background until either recovery or a configured staleness ceiling is reached.
This is the DNS-resolver realisation of the general
fail-stale failure-mode default. See
patterns/serve-stale-over-servfail for the full pattern-
altitude canonicalisation. The 2026-05-05 DENIC .de DNSSEC
break made the value explicit: serve-stale kept
systems/cloudflare-1-1-1-1-resolver|1.1.1.1's NOERROR rate
stable for ~3 hours while upstream was broken, buying time for
Cloudflare to deploy a
Negative Trust
Anchor equivalent. From Cloudflare's writeup:
"When upstream resolution fails, a resolver may continue serving expired cached records rather than returning an error. This significantly cushions the impact of an upstream outage, buying time for operators to respond."
— sources/2026-05-06-cloudflare-when-dnssec-goes-wrong-de-tld-outage
Seen in¶
-
Cloudflare — When DNSSEC goes wrong: how we responded to the
.deTLD outage (2026-05-06). Canonical wiki production instance of serve-stale (RFC 8767) at the public-resolver altitude. Stated verbatim: "That's 'serve stale' at work … The result is visible in the graph below … Without stale-served responses, the NOERROR rate drops steadily from 19:30 onward. These represent queries that users received good answers for only because their record was still in cache." The 3-hour cushion absorbed the initial impact of the DENIC.deDNSSEC break while NTA mitigation was prepared. (Source: sources/2026-05-06-cloudflare-when-dnssec-goes-wrong-de-tld-outage.) -
Stripe — The secret life of DNS packets (2024-12-12). Stripe runs Unbound on every host as a local caching tier above the central DNS server cluster. During the incident, the Hadoop job's reverse-lookup workload was cache-cold (each IP unique), so caching provided no mitigation and the load landed on the central cluster and then the VPC resolver.
Related¶
- concepts/dns-reverse-lookup-ptr
- systems/unbound
- concepts/fail-stale — the general failure-mode-default pattern serve-stale specialises to DNS-resolver altitude.
- concepts/stale-while-revalidate-cache — the HTTP-cache sibling (RFC 5861); same posture, different protocol altitude.
- patterns/serve-stale-over-servfail — the pattern canonicalisation.
- concepts/dnssec — serve-stale works transparently with DNSSEC because cached records carry their signatures.