PATTERN Cited by 1 source

Application-side query limit with dynamic threshold¶

The pattern¶

Application-side query limit with dynamic threshold is an admission-control discipline where the query-builder / client-library layer — upstream of the shared backend — inspects each query's shape and rejects (or caps) those that would be unduly expensive, before the query ever reaches the backend. The thresholds are runtime-tunable (via config service, feature flag, or hot-reloadable rules file) so operators can tighten them during incidents and loosen them when capacity recovers.

The defining properties:

App-side, not backend-side. The check runs in the caller's process, not in the shared backend's admission code. This protects the backend from ever seeing the pathological query, avoiding the case where the backend's own admission control is already too busy to run.
Shape-based cost inspection, not volume. The decision criterion is "how expensive would this query be?" — field cardinality, aggregation bucket count, missing selective predicates, wildcard-leading LIKE — not QPS.
Dynamic threshold. Operators can change the cap without a code deploy. This is load-bearing — fixed thresholds are either too loose (don't stop pathological traffic) or too tight (reject legitimate business queries), and the right number depends on cluster capacity at the moment.

The canonical wiki anchor is Zalando's Search & Browse team's 2025-12-16 follow-up list:

"We introduced application-side query limiting with dynamically adjustable thresholds, to prevent queries that would try to scan or aggregate too much data." (Source: sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search.)

What the inspection layer checks¶

Per-query cost predictors the app-side layer can evaluate before submitting the query:

Predictor	Why it predicts cost	Example cap
Aggregation field cardinality	terms-on-unique-ID is the canonical pathology	`max_cardinality < 10^6` for facet fields
Aggregation bucket count	Linear in memory + coordinator merge cost	`max_buckets ≤ 10,000` per request
Result window (`from + size`)	Linear in coordinator memory	`from + size ≤ 10,000`
Filter selectivity	Unfiltered scans hit all shards	Require at least one index-friendly filter
Wildcard-leading patterns	Cannot use term index	Reject `LIKE '%foo'` or `foo` wildcard queries
Nested query depth	Compounds per-shard work	Cap nesting depth
Scroll size × ttl	Long-lived scroll contexts pin resources	Cap scroll TTL

For each predictor, a static threshold catches egregious cases; the dynamic threshold catches the marginal cases that depend on current cluster health.

Relation to cluster-side guardrails¶

This pattern is complementary to, not a replacement for, cluster-side guardrails like Elasticsearch's search.max_buckets (see patterns/cluster-wide-aggregation-guardrail).

Lever	Lives at	Protects	Cost of rejection
App-side limit	Query-builder layer	Backend from ever seeing the query	Low — client gets a clean error locally
`search.max_buckets`	ES cluster setting	Coordinator from unbounded bucket count	Medium — request already accepted, work done before rejection
token-bucket slow-query limiter	Observability path	Monitoring pipeline from storm	High — not an admission primitive; it's a telemetry rate-cap

The defence-in-depth stance is to run both — app-side for early rejection of shape-pathological queries, cluster-side as the final guardrail for queries that slip through (including those from callers that bypass the app-side layer).

Why dynamic thresholds matter¶

Static thresholds fail in both directions:

Too loose: the threshold is set at "what we currently see in production" which is exactly what a production incident has to exceed to be an incident. The threshold can't stop the next incident.
Too tight: legitimate business queries (analytics users, partner exports, end-of-quarter reporting) get rejected. The business blames the reliability team.

Dynamic thresholds resolve the tension:

Steady state: loose threshold — legitimate business queries pass, only pathological outliers rejected.
Degraded state: operator tightens the threshold during incident — rejects queries the cluster would normally absorb but currently cannot.
Recovery: operator loosens the threshold as capacity returns.

The mechanism that makes this operable is a hot-reloadable config path for the thresholds — feature flag service, etcd watch, ConfigMap + SIGHUP — so no deploy is required mid-incident.

Interaction with per-client attribution¶

The pattern pairs naturally with per-client slow-query dashboards via X-Opaque-Id:

Dashboard identifies the pathological caller.
Threshold is tightened for that caller — the dynamic threshold can be per-caller-class, not just global.
Legitimate callers continue unaffected.

Without per-caller attribution, the operator has to choose between tightening globally (punishing innocent callers) or leaving the bad caller alone. With attribution, the dynamic threshold becomes a targeted weapon.

Seen in¶

sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search — canonical wiki instance. Follow-up engineering action after the 2025-12-16 self-inflicted DoS. The Zalando Search & Browse team added dynamically adjustable query-cost thresholds in the app-side query-builder layer specifically to prevent "queries that would try to scan or aggregate too much data." Paired with X-Opaque-Id client attribution and new per-client slow-query dashboards in a three-piece post-incident defence.

concepts/self-inflicted-dos — the failure mode this pattern pre-empts
concepts/high-cardinality-aggregation-overload — the specific query pathology the inspection layer looks for
concepts/load-shedding-at-ingestion — the parent concept family (load shedding at the boundary, not inside)
concepts/capacity-vs-rate-limit-quota — the axis the dynamic threshold rides on
patterns/cluster-wide-aggregation-guardrail — cluster-side complementary lever
patterns/per-client-slow-query-dashboard — attribution pairing
patterns/token-bucket-slow-query-limiter — adjacent but distinct: rate-caps the observability pipeline, not admission
systems/elasticsearch