Skip to content

CONCEPT Cited by 1 source

Aggregate tag attribution

The capability to attribute aggregate query statistics (total runtime, row counts, execution count, cumulative resource consumption) to individual tag values — not just the per-occurrence sample stream. Contrast with tag-counter-only aggregation, where a single aggregate message carries a histogram of tag values ("this pattern ran 3× with controller=users, 5× with controller=sessions") but cannot split the total runtime between the two.

Why the split matters

Without per-tag attribution of aggregate statistics, users can answer "how often did this pattern run?" but cannot answer the operationally important question "what fraction of the total time is attributable to each tag value?" The latter is the basis for cost-attribution, noisy-neighbour identification, and feature-rollout regression-hunting. Per-occurrence tag capture alone doesn't help: the notable-query tail (slow, heavy, or failing queries) is by construction unrepresentative of the workload's average-case behaviour, so total-runtime attribution can't be computed from it.

Implementation shape: per-unique-combination aggregates

Canonical wiki implementation (Source: sources/2026-04-21-planetscale-enhanced-tagging-in-postgres-query-insights): emit one aggregate message per distinct combination of tag key-value pairs observed for a query pattern in the aggregation window. Five executions of the same query pattern with two distinct controller values produce two aggregate messages, one per controller value, each carrying the total time / count / rows-read attributable to that value.

The alternative — one aggregate message per query pattern with a tag-histogram field — is cheaper to emit and store but cannot answer the attribution question: "the summary data for one tag is permanently combined with the data from all tags."

The cost: combinatorial tag explosion

Per-unique-combination aggregation comes with a serious failure mode: if any tag key is high-cardinality (e.g. request_id), the aggregate stream degenerates to one message per query execution — aggregation is destroyed. Multiple low-cardinality tags compose multiplicatively: 6 tags × 10 values each = 10⁶ potential combinations per pattern. This is what makes tag-cardinality collapse load-bearing rather than optional: without it, the attribution capability is architecturally unsound at scale.

Relationship to notable-query tagging

Per-occurrence tagging on the notable-query tail (pre-existing at PlanetScale before this release) is a compatible but orthogonal surface — it attributes individual slow/heavy/ errored occurrences to their tags, useful for debugging a specific incident. Aggregate tag attribution answers a different question class: "across the entire workload (cheap queries too), where did the time go?"

Seen in

Last updated · 347 distilled / 1,201 read