Skip to content

SYSTEM Cited by 1 source

pingora-origin

pingora-origin is one of Cloudflare's Pingora-based proxy services, responsible for the final hop of a (non-cached) user request leaving Cloudflare's infrastructure on its way to the customer's own origin server. Named unimaginatively after its job ("internally we call the request's destination server its 'origin', so our service has the … name of pingora-origin").

Responsibilities

  • Outbound request forwarding from the Cloudflare edge to the customer origin.
  • Stripping Cloudflare-internal headers — headers used inside the edge fleet for routing / measurement / optimization must not leak to the customer origin.
  • Routing / optimization / measurement logic that applies specifically to the edge → origin segment.

Scale (2024)

  • ≈35 M requests/sec globally leaving pingora-origin.
  • 40,000 saturated CPU cores-per-second of compute.
  • Every per-request function runs in the hot path at this rate — 1 % of CPU = 400 cores. Non-obvious helpers become worthwhile optimization targets.

Performance posture

  • Profiling via stack-trace sampling (see concepts/stack-trace-sampling-profiling) on production instances — CPU share of each function estimated as sample-containment-%.
  • Criterion (systems/criterion-rust) for per-function nanosecond-resolution microbenchmarks with regression tracking.
  • Microbench predictions and production sampling are cross-checked on each optimization — the trie-hard rollout matched predicted CPU within ~0.1 %.

Canonical optimization (2024-09-10)

The clear_internal_headers function — a pleasant-looking per-request loop that removed 100+ internal headers — was 1.71 % of pingora-origin CPU (≈680 cores). Two successive changes brought it to 0.43 %:

  1. Flip lookup direction (patterns/flip-lookup-direction) from iterate-internal-list → iterate-request-headers against a HashMap<HeaderName>. 3.65 µs → 1.53 µs (2.39×), actual CPU 1.71 % → 0.82 %.
  2. Custom systems/trie-hard crate replaced the HashMap: 1.53 µs → 0.93 µs; actual CPU 0.82 % → 0.34 % (post-deploy, July 2024).

Net saving: 1.28 % of pingora-origin CPU = ~550 cores on a 40,000-core baseline. The case study is now the canonical wiki instance of patterns/measurement-driven-micro-optimization.

Seen in

Last updated · 200 distilled / 1,178 read