CONCEPT Cited by 1 source
DNS request amplification via retries¶
When DNS queries are routed through multiple layers of resolvers (e.g. application → local per-host resolver → central cluster resolver → upstream authoritative), each layer independently applies its own timeout-and-retry logic. If the deepest upstream slows down or drops queries, each layer retries independently, and the outbound traffic observed at the upstream multiplies by the per-layer retry count.
Mechanism¶
- Client retries configured to 5× per failed query.
- On-host resolver (e.g. Unbound) has its own smoothed-RTT-based retry timeout; on slow upstream it retries independently of the client.
- Central cluster resolver does the same to its upstream.
- Net multiplier is approximately the product of per-layer retry counts, not the sum.
Failure amplifies multiplicatively, not additively. Once the upstream starts failing, the traffic volume hitting it goes up — the exact opposite of the load-shedding the failure should trigger.
Seen in¶
- Stripe — The secret life of DNS packets (2024-12-12). Stripe measured an average ~7× amplification of the underlying query volume during saturation events: client retry (5×) plus local and cluster-level resolver retries compounded on a slow VPC resolver. This amplification is what took the system from "VPC resolver is a bit slow" to "VPC resolver packet-rate cap is saturated and all DNS fails."
Mitigation¶
- Distribute resolver load so the upstream isn't saturated in the first place. See patterns/distribute-dns-load-to-host-resolver.
- Separate forwarding rules for fast-path (private) and slow-path (public) zones so the smoothed-RTT retry timeout for the fast path isn't poisoned by slow-path latency.
- Tune retry counts and timeouts down. Default per-layer retry counts are often overly generous for a given workload.
- Observe the amplification. Packet-counter-based rate metrics (see patterns/iptables-packet-counter-for-rate-metric) expose the outbound packet rate even when query-count metrics look normal.
Related¶
- concepts/vpc-resolver-packet-rate-limit
- concepts/dns-servfail-response
- systems/unbound
- concepts/thundering-herd — adjacent amplification class at a different altitude (many clients retry a single shared resource simultaneously).