CONCEPT Cited by 1 source

Self-inflicted DoS¶

Definition¶

A self-inflicted denial of service is an outage where the saturating traffic originates from an internal, trusted client — not from an external attacker — and is typically valid, well-formed, and syntactically indistinguishable from legitimate traffic. The client's intent is benign; the outage is produced by a mismatch between query cost (CPU / memory / I/O per request) and the monitoring regime, which is almost always gated on request volume rather than cost.

The signature is:

A trusted service inside the perimeter starts issuing queries at a rate that is "nothing" compared to normal inbound traffic (20–100 req/s against a cluster handling thousands).
The queries are syntactically valid and semantically legitimate — a rate-limiter or WAF can't distinguish them from normal traffic.
The queries' per-request cost is pathologically high (a high-cardinality aggregation, a scatter-gather with no selective filter, a recursive computation, an unbounded LIKE '%...%' scan).
The coordinator / thread-pool / buffer-pool fills up; tail latency spikes; normal traffic is starved.
Classic infrastructure dashboards (overall QPS, overall error rate, CPU, memory) say "the cluster is just busy" — they do not indicate a specific caller.

The canonical wiki anchor is sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search:

"It was later discovered that the root cause of the issue was a self-inflicted Denial of Service (DoS) attack. As a result of a maintenance workload coupled with a bug in the processing logic of the application, the internal client application was sending a small, but sufficient number of parallel overwhelming faceting queries to the Elasticsearch cluster."

Why the volume × cost mismatch is the diagnostic pivot¶

Rate limiters and volume-based alerts project all traffic onto a single scalar — requests per second. But the load a cluster actually bears is cost × count, and the cost dimension is wildly variable:

Query shape	Relative cost	Typical volume-based alarm behaviour
Primary-key point lookup	1×	Caught by any alert (high volume tolerated)
Indexed range scan (selective)	~10×	Caught if tail latency instrumented
Full-text bool query (cached)	~10–100×	Typically fine
Faceting aggregation on a cardinality-~M field (e.g. brand, size)	~100×	Cached at coordinator, tolerable
Faceting aggregation on a cardinality-~100M field (e.g. SKU)	~10,000×+	Invisible to volume alerts because only a few per second fit before the cluster saturates

At the top end of this table, a caller at 20–100 req/s is "1-3% of normal cluster inbound" by volume, but can consume 100% of the coordinator CPU budget simultaneously. The volume-altitude monitoring will not register the caller because it is dwarfed by millions of legitimate cheap queries.

Load-bearing preconditions for this failure mode¶

A shared backend serving many internal callers (search cluster, database, coordinator service) where queries are generic and caller identity is not load-bearing in the authorisation model.
No per-query cost model at the client boundary — the client library or gateway accepts arbitrary queries without flagging high-cardinality aggregations, unbounded scans, or missing selective predicates.
No per-client slow-query attribution. The slow-query log exists but does not record which caller sent the slow query, so recurrent pathological callers are invisible. Zalando's explicit remediation was propagating an X-Opaque-Id header at the Elasticsearch request boundary.
An internal caller that is both trusted and automated — the bug path must be triggerable without a human in the loop, because humans correcting themselves in real time don't produce multi-hour incidents.

Why horses-not-zebras debugging misses this¶

When cluster CPU spikes, the playbook-ordered hypotheses are:

Recent deploy regression on the cluster → Zalando: no recent deploys.
Write load spike → Zalando: write load normal.
External traffic spike → Zalando: inbound QPS normal.
Infrastructure fault (node failure, AZ degradation) → Zalando: other clusters fine.
Misconfiguration / GC pause / JVM issue → Zalando: no.

All five are horses — the common causes. Self-inflicted DoS is the zebra: rarer, not in the first-line playbook, not explained by the metrics the first-line playbook reads. Canonical zebra lesson.

Remediation levers¶

Lever	Canonical example	Where it lives
Per-client cost attribution	concepts/x-opaque-id-client-attribution + patterns/per-client-slow-query-dashboard	Client HTTP header + slow-query-log pipeline
Application-side query limits	patterns/application-side-query-limit-with-dynamic-threshold	Query-builder layer (before hitting the shared backend)
Cluster-wide aggregation guardrails	`search.max_buckets` / patterns/cluster-wide-aggregation-guardrail	Elasticsearch cluster setting
Per-client workload isolation	Per-tier thread pools / patterns/tier-tagged-query-isolation / patterns/route-tagged-query-isolation	Coordinator-layer scheduling
Market / cell split to contain blast radius	patterns/split-cluster-by-market-for-load-isolation / concepts/market-group-country-isolation	Cluster topology
Trace-altitude per-caller anomaly detection	systems/lightstep notebook-exploration workflow	APM / tracing backend

The Zalando post specifically names "rate limiting based on the type of the client traffic. Not all clients should be equal" as the follow-up direction.

Seen in¶

sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search — canonical wiki instance. Internal Zalando application, triggered by an automated maintenance workload plus a processing-logic bug, sent 20–100 req/s of high-cardinality terms aggregations on the SKU field to an Elasticsearch cluster handling thousands of req/s of normal traffic. Cluster starved, coordinator CPU pinned, filters broken across two of the largest markets; mitigated by a 5-lever app-side load-shed + a structural market-split via node-allocation-based cluster split. Root cause identified via a Lightstep trace-exploration notebook that spotted the caller running at 50× its normal volume.

Contrast with external DoS¶

Dimension	External DoS	Self-inflicted DoS
Source	Hostile / unknown	Trusted internal service
Query validity	Often malformed / exploratory	Syntactically valid + semantically legitimate
Volume	Usually high	Often very low vs baseline
Per-query cost	Variable	Pathologically high
Defence	WAF / rate limit / blocklist	App-side cost limit + per-caller attribution
Detection	Volume-based alarms	Per-caller cost-weighted alarms

External DoS defences (WAFs, IP rate limits, anomaly-detection on request headers) do not catch self-inflicted DoS because the attacker is inside the authenticated perimeter and their requests look normal.

concepts/high-cardinality-aggregation-overload — the specific per-query mechanism at Elasticsearch
concepts/x-opaque-id-client-attribution — the observability primitive that closes the attribution gap
concepts/zebra-not-horse-heuristic — the debugging mental model that surfaces this failure mode
concepts/blast-radius — what structural isolation bounds
concepts/tail-latency-spike-during-queueing — the user-visible symptom
concepts/load-shedding-at-ingestion — the generalised parent concept
patterns/application-side-query-limit-with-dynamic-threshold
patterns/per-client-slow-query-dashboard
patterns/cluster-wide-aggregation-guardrail
patterns/split-cluster-by-market-for-load-isolation
systems/elasticsearch