Skip to content

CONCEPT Cited by 1 source

HTTP/3 probing gap

Definition

The HTTP/3 probing gap is the observability failure class that appears when an org rolls out HTTP/3 on its edge: existing client-side black-box probers — both commercial SaaS observability products and in-house tools like Prometheus Blackbox Exporter — are TCP-shaped. Since HTTP/3 runs over QUIC over UDP, probers built around TCP sockets, TCP connect timings, and TCP-level TLS handshakes literally cannot speak the new protocol.

The result: a service is serving HTTP/3 traffic to millions of clients but the monitoring stack is blind to it.

Canonical wiki datum

Slack's 2026-03-31 post is the canonical instance (Source: sources/2026-03-31-slack-from-custom-to-open-scalable-network-probing-and-http3-readiness). Verbatim framing:

"Since HTTP/3 is built on top of the QUIC transport protocol, it uses UDP instead of the traditional TCP. This fundamental shift to a new transport meant that existing monitoring tools and SaaS solutions were not capable of probing our new HTTP/3 endpoints for metrics."

"None of the SaaS observability tools we investigated supported HTTP/3 probing out of the box. Our internal Prometheus Blackbox Exporter (BBE), a cornerstone of our monitoring, didn't have native support for QUIC."

"Without the ability to probe hundreds of thousands of HTTP/3 endpoints in our new infrastructure, we couldn't get the client-side visibility we needed to monitor regressions to HTTP/2 or accurate round trip measurements."

Why it happens

Prober implementations bake TCP assumptions in deeply:

  • Socket API: net.Dial("tcp", …) vs net.ListenUDP.
  • Handshake model: HTTP/1.1 + HTTP/2 layer TLS on top of TCP; HTTP/3's QUIC fuses transport + TLS 1.3 into one handshake.
  • Connection-pooling primitives: HTTP/1.1/2 pool TCP connections keyed by (host, port, protocol); HTTP/3 multiplexes streams within a QUIC session.
  • Metric shape: the phase label on HTTP/1/2 probes (DNS / connect / TLS / processing / transfer) partly collapses in QUIC — the handshake phase is fused.

Retrofitting a TCP-shaped prober to also speak QUIC requires pulling in a QUIC library (e.g. systems/quic-go) and wiring its http3.Transport into the existing http.Client abstraction:

http3Transport := &http3.Transport{
    TLSClientConfig: tlsConfig,
    QUICConfig:      &quic.Config{},
}
client = &http.Client{Transport: http3Transport}

Consequences if unresolved

  • Silent regressions to HTTP/2. An edge issue causing clients to fall back from HTTP/3 to HTTP/2 is invisible — the HTTP/2 probes still pass.
  • Inaccurate RTT metrics. HTTP/3's RTT profile is different from HTTP/2's (fused handshake, 0-RTT resumption). Without HTTP/3-native probes you cannot measure the new-protocol RTT at all, which invalidates RTT dashboards used for latency SLOs.
  • Safe-rollout gating breaks. Without client-side HTTP/3 metrics you cannot tell whether the new transport is healthier / worse / same as the old one. This is what makes HTTP/3 rollouts stall at small traffic percentages.

This is the driving case for the concepts/observability-before-migration discipline: close the probing gap before the transport migration proceeds, not during or after.

Generalisation

The HTTP/3 probing gap is a specific instance of the broader transport-migration observability gap pattern:

  • Migrations that keep the same wire protocol but change the transport (TCP→QUIC; HTTP→gRPC; JSON→Protobuf) tend to invalidate transport-aware probes / parsers / traces.
  • Migrations that keep the same transport but change the address family (IPv4→IPv6; single-region→multi-region anycast) tend to invalidate address-aware probes / routing tests.

The general remediation: add a new probe dimension before turning on the new path, not after.

Remediation shape at Slack

Slack's remediation (described in the 2026-03-31 post):

  1. Choose a QUIC client library — Slack picked systems/quic-go for its wide adoption across Go OSS and first-class HTTP client support.
  2. Add a new BBE module http3 following existing BBE configuration patterns.
  3. Open-source the new module to Prometheus Blackbox Exporter upstream — CONFIGURATION.md L196–L200`.
  4. Ship the in-house integration in parallel with the upstream PR so Slack isn't blocked on merge timeline — the patterns/upstream-contribution-parallel-to-in-house-integration instance.
  5. Unify HTTP/1.1 + HTTP/2 + HTTP/3 metrics in systems/grafana for side-by-side comparison — the "single pane of glass" payoff.

Seen in

Last updated · 470 distilled / 1,213 read