CONCEPT Cited by 1 source

Distribution quality vs p99 tail¶

Definition¶

Distribution quality is a measurement-philosophy axis for latency / performance metrics: do you optimise to maximise the share of observations in a "fast / instant" bucket (i.e. move the bottom of the distribution), or do you optimise to minimise the worst-case tail percentiles (p90 / p99 / p99.9) (i.e. compress the top of the distribution)?

Both are legitimate goals, but they pull engineering effort in different directions and select for different work:

Tail-control posture — invest in eliminating outliers: hedged requests, request-cancellation, GC tuning, slow-path audits, cold-start mitigations.
Distribution-quality posture — invest in making the common case faster: caching, prefetching, code splitting, rendering-from-memory, background revalidation.

The two are not mutually exclusive and most mature systems do some of each, but the explicit choice of which to centre as the OKR / north-star metric materially shapes architecture.

Canonical instance: GitHub Issues `issues#show` (2026-05-14)¶

GitHub Engineering's issues#show perf rewrite made this an explicit, named transition: "Historically, we dedicated significant effort to tracking the p90 and p99 of the HPC and minimizing the worst tail of the distribution. While this work remains important, it does not inherently ensure that the product feels fast for the majority of users. It is possible to enhance the p99 of the HPC while still leaving the median experience feeling sluggish. For this initiative, we shifted focus toward distribution quality: how many navigations land in our fast and instant buckets across the whole population? The goal is not just fewer terrible outliers. It's to make speed the default path for the majority of sessions." (Source: sources/2026-05-14-github-from-latency-to-instant-modernizing-github-issues-navigation-performance)

The architectural consequence shows up in the post-rewrite HPC percentile shifts on the full issues#show traffic distribution:

Percentile	Pre	Post	Delta
P10	~600 ms	70 ms	-88 %
P25	~800 ms	120 ms	-85 %
P50	~1200 ms	700 ms	-42 %
P75	1800 ms	1400 ms	-22 %
P90	2400 ms	2100 ms	-12.5 %

The bottom of the distribution moves dramatically more than the top — "P10 and P25 compressed dramatically because cached and preheated navigations now dominate that part of the distribution. The median improved meaningfully but is still shaped by cold-start traffic. And the upper tail, while better, reflects the hard-navigation paths where JavaScript boot and client rendering are now the bottleneck — exactly the area we are targeting next." This is what investments in caching + preheating + service-worker shells produce when the metric you optimise for is bucket share rather than worst-case percentile.

Why "minimising p99" can leave the median sluggish¶

A subtle but important observation in the GitHub framing: it is mathematically possible to compress the p99 of a distribution without moving its median or lower quantiles at all, if the intervention applies only to the slowest paths. "It is possible to enhance the p99 of the HPC while still leaving the median experience feeling sluggish."

The classic mechanism: hedged requests, retries, and timeouts often cap the worst outcomes (p99 / p99.9) without doing anything for the typical request, which never hit the slow path in the first place. If the typical request takes 1.2 s, trimming a 5 s tail to 2 s improves the p99 but doesn't move the median.

This is why distribution quality and tail control are genuinely different optimisation targets — and why the choice of which to centre is a measurement-philosophy decision, not just a metric choice.

When to centre each posture¶

Situation	Likely correct posture
Most user sessions feel slow	Distribution quality
Most sessions feel fine, but rare ones break flow / cause complaints	Tail control
New product surface with no perf history	Distribution quality (set the floor first)
Mature product with known stability problems	Tail control (smooth the failure modes)
SLO-bound service with hard p99 contract	Tail control (contract demands it)
Latency-perceived UX where median latency is the user's lived experience	Distribution quality

GitHub Issues fell into the most-sessions-feel-slow bucket because the dominant navigation path was also the slowest (57.6 % hard navigations at HPC ~2.05 s); compressing the p99 would have left those 57.6 % of users feeling exactly as sluggish as before.

HPC is consumed through this lens — not as a single number but as a share of navigations in instant / fast / slow buckets. "How many navigations land in our fast and instant buckets across the whole population?" This is the metric-level realisation of the distribution-quality posture: the metric is reported as bucket shares, not percentiles, so optimisations that move bucket shares are directly visible in the metric.

A system that reports only p99 cannot distinguish "the p99 was fixed by hedging" from "the median was fixed by caching" — a system that reports bucket shares can.

Tail-control's continuing role¶

GitHub explicitly says tail work "remains important" — they aren't abandoning p99 minimisation, they're shifting where the optimisation effort goes first. The post's stated next step ("the upper tail, while better, reflects the hard-navigation paths where JavaScript boot and client rendering are now the bottleneck — exactly the area we are targeting next") is a return to tail control once the floor has been raised.

The cleanest way to read the philosophy shift is sequencing: distribution-quality work first to lift the bottom, then tail-control work to compress the top.

Seen in¶

sources/2026-05-14-github-from-latency-to-instant-modernizing-github-issues-navigation-performance — canonical wiki instance. GitHub's issues#show perf rewrite team explicitly names the philosophy shift, ties it to bucket-share consumption of HPC, and produces the asymmetric P10–P25 vs P75–P90 compression that this posture predicts.

concepts/tail-latency-at-scale — the canonical tail-control posture; this concept is its measurement- philosophy counterpart.
concepts/highest-priority-content-hpc — the metric GitHub uses, consumed through the bucket-share lens that realises distribution-quality measurement.
concepts/user-perceived-latency — perceived-latency framing aligns with distribution quality because users experience the median, not the p99.
concepts/core-web-vitals — Web Vitals' threshold-based good/needs-improvement/poor classification is itself a bucket-share consumption shape.
systems/github-issues-show — canonical wiki instance.