Skip to content

CONCEPT Cited by 1 source

Tag cardinality collapse

The technique of replacing specific tag values with a single placeholder value (typically *) at emission time to bound the number of distinct aggregate records produced by a per-unique-combination aggregation pipeline. A lossy compression primitive for observability pipelines that otherwise would emit one message per execution when a tag is high-cardinality.

Shape

For a tag key identified as problematic (by whatever policy — see concepts/per-pattern-tag-cardinality and concepts/per-interval-tag-combination-limit), the emission path stops using specific values like request_id=abc123, request_id=def456, … and instead uses request_id=* for every execution. Aggregate messages that would have been distinct (one per unique value) now merge into a single message under the collapsed key.

"When a tag (or set of tags) would result in sending too much telemetry data, we collapse that tag by replacing specific values (like request_id="a" and request_id="b") with a value that indicates it has been removed: request_id=*. This lets us more aggressively merge aggregates and reduce the total number of messages sent, while ensuring that we're capturing 100% of the summary data."

(Source: sources/2026-04-21-planetscale-enhanced-tagging-in-postgres-query-insights)

What the trade-off is

  • Preserved: aggregate totals — total query count, total runtime, total rows read — are unchanged by collapse. Every execution is still counted; just the per-value breakdown is lost.
  • Lost: per-value attribution under the collapsed key. The user can no longer answer "what fraction of time came from request_id=abc123?" because that value is now indistinguishable from every other collapsed value.
  • Observable: the collapse is recorded on the emitted aggregate; the UI can display "X% of tag values are unknown for this key" so users know when a breakdown is incomplete.

Why it must be dynamic, not static

A static policy ("never capture tag values for key request_id") fails on two axes:

  1. It's brittle to naming conventions — one team might use request_id, another req-id, another client-correlation-id; the policy has to enumerate all of them.
  2. It misses the opposite case: a tag that's named high-cardinality but is in practice low-cardinality. source_location (file + line number) is nominally high-cardinality globally but is usually 1-2 values per query pattern — a static policy that collapses it would lose useful attribution.

Dynamic collapse — based on observed cardinality at emission time — handles both cases without manual enumeration.

Two canonical triggers

Canonical wiki decomposition (Source: same as above):

  • Cardinality of a specific tag key on a specific query pattern exceeds a threshold (e.g. > 20 distinct values on a pattern, over a rolling window). Catches inherently high-cardinality keys like request_id. Canonicalised as concepts/per-pattern-tag-cardinality.
  • Number of unique tag-combinations on a specific query pattern in a specific emission interval exceeds a threshold (e.g. > 50 combinations per 15s). Catches explosions from combining individually-low-cardinality tags. Canonicalised as concepts/per-interval-tag-combination-limit.

Both triggers feed the same collapse primitive; they differ in what they observe (one key vs combination count) and when they act (1h sticky window vs per-interval).

Generalisation beyond query tags

The technique generalises to any observability pipeline that does per-unique-combination aggregation: Prometheus histogram labels, OpenTelemetry attribute reduction, structured-log field rollup. The specific collapse primitive (replace with *, emit collapse flag) and the triggering policy (per-key cardinality or per-combination count) are the reusable parts.

Seen in

Last updated · 347 distilled / 1,201 read