PATTERN Cited by 1 source
Application-side query limit with dynamic threshold¶
The pattern¶
Application-side query limit with dynamic threshold is an admission-control discipline where the query-builder / client-library layer — upstream of the shared backend — inspects each query's shape and rejects (or caps) those that would be unduly expensive, before the query ever reaches the backend. The thresholds are runtime-tunable (via config service, feature flag, or hot-reloadable rules file) so operators can tighten them during incidents and loosen them when capacity recovers.
The defining properties:
- App-side, not backend-side. The check runs in the caller's process, not in the shared backend's admission code. This protects the backend from ever seeing the pathological query, avoiding the case where the backend's own admission control is already too busy to run.
- Shape-based cost inspection, not volume. The decision
criterion is "how expensive would this query be?" — field
cardinality, aggregation bucket count, missing selective
predicates, wildcard-leading
LIKE— not QPS. - Dynamic threshold. Operators can change the cap without a code deploy. This is load-bearing — fixed thresholds are either too loose (don't stop pathological traffic) or too tight (reject legitimate business queries), and the right number depends on cluster capacity at the moment.
The canonical wiki anchor is Zalando's Search & Browse team's 2025-12-16 follow-up list:
"We introduced application-side query limiting with dynamically adjustable thresholds, to prevent queries that would try to scan or aggregate too much data." (Source: sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search.)
What the inspection layer checks¶
Per-query cost predictors the app-side layer can evaluate before submitting the query:
| Predictor | Why it predicts cost | Example cap |
|---|---|---|
| Aggregation field cardinality | terms-on-unique-ID is the canonical pathology | max_cardinality < 10^6 for facet fields |
| Aggregation bucket count | Linear in memory + coordinator merge cost | max_buckets ≤ 10,000 per request |
Result window (from + size) |
Linear in coordinator memory | from + size ≤ 10,000 |
| Filter selectivity | Unfiltered scans hit all shards | Require at least one index-friendly filter |
| Wildcard-leading patterns | Cannot use term index | Reject LIKE '%foo' or *foo* wildcard queries |
| Nested query depth | Compounds per-shard work | Cap nesting depth |
| Scroll size × ttl | Long-lived scroll contexts pin resources | Cap scroll TTL |
For each predictor, a static threshold catches egregious cases; the dynamic threshold catches the marginal cases that depend on current cluster health.
Relation to cluster-side guardrails¶
This pattern is complementary to, not a replacement for,
cluster-side guardrails like Elasticsearch's
search.max_buckets
(see patterns/cluster-wide-aggregation-guardrail).
| Lever | Lives at | Protects | Cost of rejection |
|---|---|---|---|
| App-side limit | Query-builder layer | Backend from ever seeing the query | Low — client gets a clean error locally |
search.max_buckets |
ES cluster setting | Coordinator from unbounded bucket count | Medium — request already accepted, work done before rejection |
| token-bucket slow-query limiter | Observability path | Monitoring pipeline from storm | High — not an admission primitive; it's a telemetry rate-cap |
The defence-in-depth stance is to run both — app-side for early rejection of shape-pathological queries, cluster-side as the final guardrail for queries that slip through (including those from callers that bypass the app-side layer).
Why dynamic thresholds matter¶
Static thresholds fail in both directions:
- Too loose: the threshold is set at "what we currently see in production" which is exactly what a production incident has to exceed to be an incident. The threshold can't stop the next incident.
- Too tight: legitimate business queries (analytics users, partner exports, end-of-quarter reporting) get rejected. The business blames the reliability team.
Dynamic thresholds resolve the tension:
- Steady state: loose threshold — legitimate business queries pass, only pathological outliers rejected.
- Degraded state: operator tightens the threshold during incident — rejects queries the cluster would normally absorb but currently cannot.
- Recovery: operator loosens the threshold as capacity returns.
The mechanism that makes this operable is a hot-reloadable config path for the thresholds — feature flag service, etcd watch, ConfigMap + SIGHUP — so no deploy is required mid-incident.
Interaction with per-client attribution¶
The pattern pairs naturally with
per-client slow-query
dashboards via
X-Opaque-Id:
- Dashboard identifies the pathological caller.
- Threshold is tightened for that caller — the dynamic threshold can be per-caller-class, not just global.
- Legitimate callers continue unaffected.
Without per-caller attribution, the operator has to choose between tightening globally (punishing innocent callers) or leaving the bad caller alone. With attribution, the dynamic threshold becomes a targeted weapon.
Seen in¶
- sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search
— canonical wiki instance. Follow-up engineering action after
the 2025-12-16 self-inflicted DoS. The Zalando Search & Browse
team added dynamically adjustable query-cost thresholds in the
app-side query-builder layer specifically to prevent "queries
that would try to scan or aggregate too much data." Paired
with
X-Opaque-Idclient attribution and new per-client slow-query dashboards in a three-piece post-incident defence.
Related¶
- concepts/self-inflicted-dos — the failure mode this pattern pre-empts
- concepts/high-cardinality-aggregation-overload — the specific query pathology the inspection layer looks for
- concepts/load-shedding-at-ingestion — the parent concept family (load shedding at the boundary, not inside)
- concepts/capacity-vs-rate-limit-quota — the axis the dynamic threshold rides on
- patterns/cluster-wide-aggregation-guardrail — cluster-side complementary lever
- patterns/per-client-slow-query-dashboard — attribution pairing
- patterns/token-bucket-slow-query-limiter — adjacent but distinct: rate-caps the observability pipeline, not admission
- systems/elasticsearch