CONCEPT Cited by 1 source
DNS SERVFAIL response¶
SERVFAIL (RCODE 2)
is the generic "server failure" response code a DNS server
returns when some error occurred during resolution: upstream
timeout, policy rejection, internal failure, or an unreachable
authoritative nameserver. It tells the client "something went
wrong" with no further information โ the error is opaque.
Investigation implication¶
Because SERVFAIL is a black-box errno, it cannot be the
bottom-of-stack signal for root-cause investigation. The
engineer must pivot from "we're seeing SERVFAIL" to other
signals:
- Resolver internal metrics. Unbound's request-list depth metric shows whether queries are queuing locally.
- Packet-rate observation. A packet-level rate metric (see patterns/iptables-packet-counter-for-rate-metric) can reveal whether the outbound side is hitting a cap.
- Manual reproduction via dig. Running the failing query by hand against each resolver in the chain localises which layer is slow or failing.
- Request-list dump.
unbound-control dump_requestlistshows pending queries and what they're waiting on.
Seen in¶
- Stripe โ The secret life of DNS packets (2024-12-12).
Stripe's initial alert signal was a small-percentage
SERVFAILrate for internal requests during hourly spikes. The post explicitly calls out the opacity: "SERVFAILis a generic response that DNS servers return when an error occurs, but it doesn't tell us much about what caused the error." The investigation pivoted to Unbound's request-list depth metric to localise the problem.
Related¶
- concepts/dns-request-amplification-via-retries
- concepts/request-queue-depth-metric
- concepts/redundant-error-signalling โ the design lesson inverse: if your failure signal is opaque, complement it with independent observability signals rather than rely on retry-and-escalate.