Skip to content

CONCEPT Cited by 1 source

Distribution quality vs p99 tail

Definition

Distribution quality is a measurement-philosophy axis for latency / performance metrics: do you optimise to maximise the share of observations in a "fast / instant" bucket (i.e. move the bottom of the distribution), or do you optimise to minimise the worst-case tail percentiles (p90 / p99 / p99.9) (i.e. compress the top of the distribution)?

Both are legitimate goals, but they pull engineering effort in different directions and select for different work:

  • Tail-control posture — invest in eliminating outliers: hedged requests, request-cancellation, GC tuning, slow-path audits, cold-start mitigations.
  • Distribution-quality posture — invest in making the common case faster: caching, prefetching, code splitting, rendering-from-memory, background revalidation.

The two are not mutually exclusive and most mature systems do some of each, but the explicit choice of which to centre as the OKR / north-star metric materially shapes architecture.

Canonical instance: GitHub Issues issues#show (2026-05-14)

GitHub Engineering's issues#show perf rewrite made this an explicit, named transition: "Historically, we dedicated significant effort to tracking the p90 and p99 of the HPC and minimizing the worst tail of the distribution. While this work remains important, it does not inherently ensure that the product feels fast for the majority of users. It is possible to enhance the p99 of the HPC while still leaving the median experience feeling sluggish. For this initiative, we shifted focus toward distribution quality: how many navigations land in our fast and instant buckets across the whole population? The goal is not just fewer terrible outliers. It's to make speed the default path for the majority of sessions." (Source: sources/2026-05-14-github-from-latency-to-instant-modernizing-github-issues-navigation-performance)

The architectural consequence shows up in the post-rewrite HPC percentile shifts on the full issues#show traffic distribution:

Percentile Pre Post Delta
P10 ~600 ms 70 ms -88 %
P25 ~800 ms 120 ms -85 %
P50 ~1200 ms 700 ms -42 %
P75 1800 ms 1400 ms -22 %
P90 2400 ms 2100 ms -12.5 %

The bottom of the distribution moves dramatically more than the top — "P10 and P25 compressed dramatically because cached and preheated navigations now dominate that part of the distribution. The median improved meaningfully but is still shaped by cold-start traffic. And the upper tail, while better, reflects the hard-navigation paths where JavaScript boot and client rendering are now the bottleneck — exactly the area we are targeting next." This is what investments in caching + preheating + service-worker shells produce when the metric you optimise for is bucket share rather than worst-case percentile.

Why "minimising p99" can leave the median sluggish

A subtle but important observation in the GitHub framing: it is mathematically possible to compress the p99 of a distribution without moving its median or lower quantiles at all, if the intervention applies only to the slowest paths. "It is possible to enhance the p99 of the HPC while still leaving the median experience feeling sluggish."

The classic mechanism: hedged requests, retries, and timeouts often cap the worst outcomes (p99 / p99.9) without doing anything for the typical request, which never hit the slow path in the first place. If the typical request takes 1.2 s, trimming a 5 s tail to 2 s improves the p99 but doesn't move the median.

This is why distribution quality and tail control are genuinely different optimisation targets — and why the choice of which to centre is a measurement-philosophy decision, not just a metric choice.

When to centre each posture

Situation Likely correct posture
Most user sessions feel slow Distribution quality
Most sessions feel fine, but rare ones break flow / cause complaints Tail control
New product surface with no perf history Distribution quality (set the floor first)
Mature product with known stability problems Tail control (smooth the failure modes)
SLO-bound service with hard p99 contract Tail control (contract demands it)
Latency-perceived UX where median latency is the user's lived experience Distribution quality

GitHub Issues fell into the most-sessions-feel-slow bucket because the dominant navigation path was also the slowest (57.6 % hard navigations at HPC ~2.05 s); compressing the p99 would have left those 57.6 % of users feeling exactly as sluggish as before.

Relationship to bucket-share metrics like HPC

HPC is consumed through this lens — not as a single number but as a share of navigations in instant / fast / slow buckets. "How many navigations land in our fast and instant buckets across the whole population?" This is the metric-level realisation of the distribution-quality posture: the metric is reported as bucket shares, not percentiles, so optimisations that move bucket shares are directly visible in the metric.

A system that reports only p99 cannot distinguish "the p99 was fixed by hedging" from "the median was fixed by caching" — a system that reports bucket shares can.

Tail-control's continuing role

GitHub explicitly says tail work "remains important" — they aren't abandoning p99 minimisation, they're shifting where the optimisation effort goes first. The post's stated next step ("the upper tail, while better, reflects the hard-navigation paths where JavaScript boot and client rendering are now the bottleneck — exactly the area we are targeting next") is a return to tail control once the floor has been raised.

The cleanest way to read the philosophy shift is sequencing: distribution-quality work first to lift the bottom, then tail-control work to compress the top.

Seen in

Last updated · 542 distilled / 1,571 read