Skip to content

CONCEPT Cited by 1 source

DNS SERVFAIL response

SERVFAIL (RCODE 2) is the generic "server failure" response code a DNS server returns when some error occurred during resolution: upstream timeout, policy rejection, internal failure, or an unreachable authoritative nameserver. It tells the client "something went wrong" with no further information โ€” the error is opaque.

Investigation implication

Because SERVFAIL is a black-box errno, it cannot be the bottom-of-stack signal for root-cause investigation. The engineer must pivot from "we're seeing SERVFAIL" to other signals:

  • Resolver internal metrics. Unbound's request-list depth metric shows whether queries are queuing locally.
  • Packet-rate observation. A packet-level rate metric (see patterns/iptables-packet-counter-for-rate-metric) can reveal whether the outbound side is hitting a cap.
  • Manual reproduction via dig. Running the failing query by hand against each resolver in the chain localises which layer is slow or failing.
  • Request-list dump. unbound-control dump_requestlist shows pending queries and what they're waiting on.

Seen in

  • Stripe โ€” The secret life of DNS packets (2024-12-12). Stripe's initial alert signal was a small-percentage SERVFAIL rate for internal requests during hourly spikes. The post explicitly calls out the opacity: "SERVFAIL is a generic response that DNS servers return when an error occurs, but it doesn't tell us much about what caused the error." The investigation pivoted to Unbound's request-list depth metric to localise the problem.
Last updated ยท 470 distilled / 1,213 read