CONCEPT Cited by 1 source

High-cardinality aggregation overload¶

Definition¶

A high-cardinality aggregation overload is the failure mode where an aggregation query (terms, cardinality, significant_terms) is executed against a field whose distinct- value cardinality is orders of magnitude larger than the aggregation machinery assumes, producing CPU and memory consumption that is per-query pathological even at low request rates. The archetype in Elasticsearch is a terms aggregation over a unique-identifier field (SKU, UUID, user ID) — the aggregation has to materialise a bucket per distinct value, and the per-shard + coordinator memory and CPU cost grows with the cardinality of the matched-document set.

The canonical wiki anchor is sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search, in which Zalando's Search & Browse team diagnose the incident's query shape:

"These queries were not typical user queries. They were faceting queries that were requesting huge aggregations on very high cardinality fields, specifically on the SKU, which is a unique product ID. These types of queries are extremely resource-intensive, as they require Elasticsearch to process and aggregate a vast amount of data. Also, they aren't making any sense from a business perspective, as faceting on unique identifiers does not provide any meaningful insights."

The Elasticsearch execution path¶

A facet / terms aggregation traverses a scatter-gather shape:

Coordinator receives the search request.
Scatter: request is fanned out to all relevant shard copies. Shard copy selection uses Adaptive Replica Selection (ARS), which picks "the 'best' shard copy based on past response times and the node's search thread-pool queue size."
Per-shard work runs on the search thread pool: scan the matching docset, build the local terms aggregation result (a (term, count) multi-map sized by cardinality of the matched terms).
Partial reduction in batches on the coordinator — ES does not wait for all shards to finish at once; it reduces as shards return to cap coordinator memory footprint.
Final reduction on the coordinator yields the top-N terms returned to the client.

Soft guardrails that ship with Elasticsearch:

search.max_buckets (cluster setting) — caps "an unbounded number of aggregation buckets" per request. Zalando named this as a cluster-wide runbook to apply "to limit the size of aggregations on the whole cluster at once."
max_result_window (index setting) — caps the result-set size a single request can pull; not the aggregation bucket count, but the adjacent guardrail.
ARS re-weights shard selection away from hot nodes, but cannot re-weight away from pathological queries — once the coordinator admits them, ARS is powerless.

Why `terms` on a unique-ID field is specifically bad¶

terms aggregations on an almost-unique field produce one bucket per matched document in the worst case. For a filter selecting N documents from a catalog with SKU cardinality S, the per-shard memory cost is O(min(N, S)). For a retail catalog, S is hundreds of millions of SKUs.

The structural issue: facets are meant for low-cardinality categorical fields (brand: O(10³), size: O(10²), colour: O(10), price-bucket: O(10)) where the top-N buckets are a stable, cacheable summary. SKU facets violate this assumption so thoroughly that "they aren't making any sense from a business perspective" — the result is structurally useless even when it succeeds, which is why the incident's trigger was a bug (a maintenance workload accidentally faceting on SKU), not a business feature.

Thread-pool interaction on ES 8.12+¶

Pool	Role	High-cardinality `terms` impact
`search`	Per-shard query + aggregation execution	Pinned by pathological facet queries. When queue fills, ES starts rejecting requests.
`search_coordination`	Coordinator-level merging + final reduction	Loaded by partial-reduction batches; secondary pressure point
`search_worker` (ES 8.12+)	Parallel collectors for intra-shard work slicing	Not used for high-cardinality `terms` — those execute single-threaded per shard on `search`

Zalando's verbatim note:

"Our incident, however, was driven by high-cardinality terms aggregations, which are not executed with those parallel collectors; they simply ran as very heavy work on the search pool, consuming a lot of CPU and memory. A small number of such pathological facet queries was enough to keep the cluster 'hot' and to starve normal traffic, which is exactly what a DoS looks like in practice."

The search_worker parallelisation that mitigates some expensive aggregations does not help here, which is part of why the failure is structurally invisible to volume-based monitoring — query-shape-specific dynamics can produce cluster-starvation at volumes the cluster would otherwise absorb.

Why caching doesn't save you¶

Elasticsearch coordinator nodes cache aggregation results, and the presentation layers (Catalog API, NER query builder) cache popular filter combinations. Both assume a bounded, reusable set of facet query shapes. A pathological client issuing facet queries with novel filter / SKU combinations defeats every cache layer — each query is a unique cache miss, forcing ES to do the full scatter-gather from cold.

Defence primitives¶

search.max_buckets cluster-wide — patterns/cluster-wide-aggregation-guardrail.
Application-side query inspection — reject aggregations on fields above a cardinality threshold at the query-builder layer, before the query ever reaches ES. See patterns/application-side-query-limit-with-dynamic-threshold.
Per-client attribution of aggregation queries via X-Opaque-Id so pathological callers are identifiable in the slow-query log.
Separate thread-pool or separate cluster for known-heavy aggregation workloads — e.g. business-users' analytical faceting hits a warehouse, not the user-facing search cluster.
Blacklist high-cardinality fields from aggregation entirely at index-mapping time ("doc_values": false or "eager_global_ordinals": false) for fields that should never be facet targets.

Seen in¶

sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search — canonical instance. Internal Zalando application sent 20–100 req/s of terms aggregations on the SKU field to an Elasticsearch cluster serving two of the largest markets. The per-query cost was sufficient that even this low volume starved the search thread pool; the cluster's queue overflowed and began rejecting normal traffic, producing "search is slow" and "filters are broken" customer-facing failures. Root cause surfaced via a systems/lightstep notebook that detected the caller at 50× baseline fan-out.

concepts/self-inflicted-dos — the outage class this mechanism is the engine of
concepts/scatter-gather-query — the execution shape facets inherit
concepts/adaptive-replica-selection-elasticsearch — the shard-selection primitive that doesn't save you here
concepts/tail-latency-spike-during-queueing — the symptom once the thread pool queue grows
patterns/cluster-wide-aggregation-guardrail — search.max_buckets
patterns/application-side-query-limit-with-dynamic-threshold — pre-submit cost cap
systems/elasticsearch