CONCEPT Cited by 1 source
High-cardinality aggregation overload¶
Definition¶
A high-cardinality aggregation overload is the failure mode
where an aggregation query (terms, cardinality,
significant_terms) is executed against a field whose distinct-
value cardinality is orders of magnitude larger than the
aggregation machinery assumes, producing CPU and memory
consumption that is per-query pathological even at low
request rates. The archetype in Elasticsearch is a terms
aggregation over a unique-identifier field (SKU, UUID, user
ID) — the aggregation has to materialise a bucket per distinct
value, and the per-shard + coordinator memory and CPU cost grows
with the cardinality of the matched-document set.
The canonical wiki anchor is sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search, in which Zalando's Search & Browse team diagnose the incident's query shape:
"These queries were not typical user queries. They were faceting queries that were requesting huge aggregations on very high cardinality fields, specifically on the SKU, which is a unique product ID. These types of queries are extremely resource-intensive, as they require Elasticsearch to process and aggregate a vast amount of data. Also, they aren't making any sense from a business perspective, as faceting on unique identifiers does not provide any meaningful insights."
The Elasticsearch execution path¶
A facet / terms aggregation traverses a
scatter-gather shape:
- Coordinator receives the search request.
- Scatter: request is fanned out to all relevant shard copies. Shard copy selection uses Adaptive Replica Selection (ARS), which picks "the 'best' shard copy based on past response times and the node's search thread-pool queue size."
- Per-shard work runs on the
searchthread pool: scan the matching docset, build the localtermsaggregation result (a(term, count)multi-map sized by cardinality of the matched terms). - Partial reduction in batches on the coordinator — ES does not wait for all shards to finish at once; it reduces as shards return to cap coordinator memory footprint.
- Final reduction on the coordinator yields the top-N terms returned to the client.
Soft guardrails that ship with Elasticsearch:
search.max_buckets(cluster setting) — caps "an unbounded number of aggregation buckets" per request. Zalando named this as a cluster-wide runbook to apply "to limit the size of aggregations on the whole cluster at once."max_result_window(index setting) — caps the result-set size a single request can pull; not the aggregation bucket count, but the adjacent guardrail.- ARS re-weights shard selection away from hot nodes, but cannot re-weight away from pathological queries — once the coordinator admits them, ARS is powerless.
Why terms on a unique-ID field is specifically bad¶
terms aggregations on an almost-unique field produce one
bucket per matched document in the worst case. For a filter
selecting N documents from a catalog with SKU cardinality S,
the per-shard memory cost is O(min(N, S)). For a retail catalog,
S is hundreds of millions of SKUs.
The structural issue: facets are meant for low-cardinality categorical fields (brand: O(10³), size: O(10²), colour: O(10), price-bucket: O(10)) where the top-N buckets are a stable, cacheable summary. SKU facets violate this assumption so thoroughly that "they aren't making any sense from a business perspective" — the result is structurally useless even when it succeeds, which is why the incident's trigger was a bug (a maintenance workload accidentally faceting on SKU), not a business feature.
Thread-pool interaction on ES 8.12+¶
| Pool | Role | High-cardinality terms impact |
|---|---|---|
search |
Per-shard query + aggregation execution | Pinned by pathological facet queries. When queue fills, ES starts rejecting requests. |
search_coordination |
Coordinator-level merging + final reduction | Loaded by partial-reduction batches; secondary pressure point |
search_worker (ES 8.12+) |
Parallel collectors for intra-shard work slicing | Not used for high-cardinality terms — those execute single-threaded per shard on search |
Zalando's verbatim note:
"Our incident, however, was driven by high-cardinality terms aggregations, which are not executed with those parallel collectors; they simply ran as very heavy work on the
searchpool, consuming a lot of CPU and memory. A small number of such pathological facet queries was enough to keep the cluster 'hot' and to starve normal traffic, which is exactly what a DoS looks like in practice."
The search_worker parallelisation that mitigates some expensive
aggregations does not help here, which is part of why the
failure is structurally invisible to volume-based monitoring —
query-shape-specific dynamics can produce cluster-starvation at
volumes the cluster would otherwise absorb.
Why caching doesn't save you¶
Elasticsearch coordinator nodes cache aggregation results, and the presentation layers (Catalog API, NER query builder) cache popular filter combinations. Both assume a bounded, reusable set of facet query shapes. A pathological client issuing facet queries with novel filter / SKU combinations defeats every cache layer — each query is a unique cache miss, forcing ES to do the full scatter-gather from cold.
Defence primitives¶
search.max_bucketscluster-wide — patterns/cluster-wide-aggregation-guardrail.- Application-side query inspection — reject aggregations on fields above a cardinality threshold at the query-builder layer, before the query ever reaches ES. See patterns/application-side-query-limit-with-dynamic-threshold.
- Per-client attribution of aggregation queries via
X-Opaque-Idso pathological callers are identifiable in the slow-query log. - Separate thread-pool or separate cluster for known-heavy aggregation workloads — e.g. business-users' analytical faceting hits a warehouse, not the user-facing search cluster.
- Blacklist high-cardinality fields from aggregation
entirely at index-mapping time (
"doc_values": falseor"eager_global_ordinals": false) for fields that should never be facet targets.
Seen in¶
- sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search
— canonical instance. Internal Zalando application sent 20–100
req/s of
termsaggregations on the SKU field to an Elasticsearch cluster serving two of the largest markets. The per-query cost was sufficient that even this low volume starved thesearchthread pool; the cluster's queue overflowed and began rejecting normal traffic, producing "search is slow" and "filters are broken" customer-facing failures. Root cause surfaced via a systems/lightstep notebook that detected the caller at 50× baseline fan-out.
Related¶
- concepts/self-inflicted-dos — the outage class this mechanism is the engine of
- concepts/scatter-gather-query — the execution shape facets inherit
- concepts/adaptive-replica-selection-elasticsearch — the shard-selection primitive that doesn't save you here
- concepts/tail-latency-spike-during-queueing — the symptom once the thread pool queue grows
- patterns/cluster-wide-aggregation-guardrail —
search.max_buckets - patterns/application-side-query-limit-with-dynamic-threshold — pre-submit cost cap
- systems/elasticsearch