CONCEPT Cited by 1 source

Metric-granularity mismatch¶

Metric-granularity mismatch is the observability failure mode where a dashboard or integration surfaces a metric at the wrong aggregation level for the question the operator is asking — most commonly, per-leaf (per-shard, per-worker, per-partition) timing masquerading as end-to-end user-visible latency.

In fan-out systems the gap can be enormous: a user query that fans out to N workers has latency max(worker latencies) + a coordinator tax, but the average worker latency looks impressively small. Reading the leaf metric as if it were the top-level one understates the real number by the fan-out ratio.

Canonical instance (Figma, 2026)¶

Figma's DataDog integration reported an "average OpenSearch query" of 8 ms while the service's p99 was ~1 s. The 8 ms was the per-shard query time between coordinator and worker nodes; for Figma's configuration, up to ~500 per-shard queries fanned out per user query, so coordinator-view latency was ~150 ms avg / 200–400 ms p99 / 40 ms min — with min > DataDog's reported "max", which was the red flag.

Key observability nuance: OpenSearch does not emit overall query time at all in its metrics or logs. The only overall-latency field is the took value in the query API response body. Figma's fix was to parse took out of every search response and publish it as a custom metric. (Source: sources/2026-04-21-figma-the-search-for-speed-in-figma)

Tell-tale signs¶

Reported "max" is lower than observed "min". Your timing wrapper and their integration can't both be right.
Averages look amazing, p99 looks terrible, and the ratio is suspiciously close to the fan-out width.
Vendor integration surfaces internal-system telemetry verbatim rather than normalising to API-call granularity.
Search- / query-/ RPC-level metrics and response-body-level fields differ (OpenSearch took, gRPC trailers, DB pg_stat_statements.total_exec_time vs application spans).

How to avoid / detect it¶

Sanity-check with two independent vantage points. Wrap API calls at the client boundary and compare. If the client says 150 ms and the "built-in" metric says 8 ms, the built-in is not measuring what you think.
Read the docs for what each metric definition actually covers — "average query latency" in a fan-out system is usually "average worker latency."
Publish a ground-truth latency metric yourself (from the response-body field that quotes overall time, or from a wrapping span). Don't rely on vendor-default dashboards for capacity planning.
For fan-out systems, capture the fan-out width alongside the per-leaf latency — then at least the mis-scaled number is recoverable from the pair.

Adjacent concepts¶

concepts/tail-latency-at-scale — why max-of-N dominates even when averages are fine. The granularity mismatch makes that math invisible.
concepts/queueing-theory — per-layer queue vs end-to-end queue: same class of mistake at the observability layer.
concepts/observability — broader umbrella.

Seen in¶

sources/2026-04-21-figma-the-search-for-speed-in-figma — OpenSearch 8 ms (per-shard avg) vs 150 ms (coordinator avg) vs 1 s (API p99); hidden for months until the min-above-their- max signal tripped. Fix: publish took as a custom metric.