CONCEPT Cited by 1 source
Tag cardinality collapse¶
The technique of replacing specific tag values with a
single placeholder value (typically *) at emission time
to bound the number of distinct aggregate records produced
by a per-unique-combination aggregation pipeline. A lossy
compression primitive for observability pipelines that
otherwise would emit one message per execution when a tag is
high-cardinality.
Shape¶
For a tag key identified as problematic (by whatever
policy — see concepts/per-pattern-tag-cardinality and
concepts/per-interval-tag-combination-limit), the
emission path stops using specific values like
request_id=abc123, request_id=def456, … and instead
uses request_id=* for every execution. Aggregate messages
that would have been distinct (one per unique value) now
merge into a single message under the collapsed key.
"When a tag (or set of tags) would result in sending too much telemetry data, we collapse that tag by replacing specific values (like
request_id="a"andrequest_id="b") with a value that indicates it has been removed:request_id=*. This lets us more aggressively merge aggregates and reduce the total number of messages sent, while ensuring that we're capturing 100% of the summary data."(Source: sources/2026-04-21-planetscale-enhanced-tagging-in-postgres-query-insights)
What the trade-off is¶
- Preserved: aggregate totals — total query count, total runtime, total rows read — are unchanged by collapse. Every execution is still counted; just the per-value breakdown is lost.
- Lost: per-value attribution under the collapsed key.
The user can no longer answer "what fraction of time
came from
request_id=abc123?" because that value is now indistinguishable from every other collapsed value. - Observable: the collapse is recorded on the emitted aggregate; the UI can display "X% of tag values are unknown for this key" so users know when a breakdown is incomplete.
Why it must be dynamic, not static¶
A static policy ("never capture tag values for key
request_id") fails on two axes:
- It's brittle to naming conventions — one team might use
request_id, anotherreq-id, anotherclient-correlation-id; the policy has to enumerate all of them. - It misses the opposite case: a tag that's named
high-cardinality but is in practice low-cardinality.
source_location(file + line number) is nominally high-cardinality globally but is usually 1-2 values per query pattern — a static policy that collapses it would lose useful attribution.
Dynamic collapse — based on observed cardinality at emission time — handles both cases without manual enumeration.
Two canonical triggers¶
Canonical wiki decomposition (Source: same as above):
- Cardinality of a specific tag key on a specific query
pattern exceeds a threshold (e.g. > 20 distinct values
on a pattern, over a rolling window). Catches
inherently high-cardinality keys like
request_id. Canonicalised as concepts/per-pattern-tag-cardinality. - Number of unique tag-combinations on a specific query pattern in a specific emission interval exceeds a threshold (e.g. > 50 combinations per 15s). Catches explosions from combining individually-low-cardinality tags. Canonicalised as concepts/per-interval-tag-combination-limit.
Both triggers feed the same collapse primitive; they differ in what they observe (one key vs combination count) and when they act (1h sticky window vs per-interval).
Generalisation beyond query tags¶
The technique generalises to any observability pipeline that
does per-unique-combination aggregation: Prometheus
histogram labels, OpenTelemetry attribute reduction,
structured-log field rollup. The specific collapse primitive
(replace with *, emit collapse flag) and the triggering
policy (per-key cardinality or per-combination count) are
the reusable parts.
Seen in¶
- sources/2026-04-21-planetscale-enhanced-tagging-in-postgres-query-insights — Canonical disclosure. Two-trigger collapse is the load-bearing mechanism that makes aggregate-stream tag attribution scale.