SYSTEM Cited by 2 sources

Zalando Catalog Search¶

Identity¶

Zalando Catalog Search is the wiki's composite-identity page for the multi-layer search substrate operated by Zalando's Search & Browse team. It is not a single service — it is the end-to-end request path from a user's search action in the app to the Elasticsearch candidate set, spanning four presentation / execution layers and two enrichment sidecars.

The canonical architectural description is sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search, which introduces the layering to explain why a pathological facet query at the bottom layer produces the user-visible "search is slow" and "filters are broken" symptoms at the top.

Layers (bottom-up)¶

┌──────────────────────────────────────────────────────────┐
│  Catalog API                                              │
│  → systems/zalando-catalog-api                           │
└──────────────────┬───────────────────────────────────────┘
                   │ fan-out 1 request → N queries
┌──────────────────┴───────────────────────────────────────┐
│  NER Query Builder                                        │
│  → systems/zalando-ner-query-builder                     │
└──────────────────┬───────────────────────────────────────┘
                   │
┌──────────────────┴───────────────────────────────────────┐
│  Search API   ←   Algorithm Gateway                       │
│                   (user-action + ML re-ranking)           │
│               ←   Promotions Bidding Service              │
│                   (sponsored-result blending)             │
│  → systems/zalando-search-api                            │
└──────────────────┬───────────────────────────────────────┘
                   │
┌──────────────────┴───────────────────────────────────────┐
│  Base Search  (Elasticsearch, coordinator + data nodes)   │
│  → systems/zalando-base-search                           │
└──────────────────────────────────────────────────────────┘

Each layer carries its own caches on the hot path:

Layer	Cache role
Catalog API	Caches popular queries and filter combinations
NER query builder	Caches popular queries and filter combinations
Base Search coordinator nodes	Caches search results and aggregations (on separate machines from data nodes)

Under normal conditions, facet queries (brand, size, colour, price-bucket aggregations) are well-behaved and benefit from this multi-cache topology. Under load, faceting against high-cardinality fields (SKU, unique product IDs) defeats every cache layer and overloads the coordinator-plus-data-node pair — the precise failure the 2025-12-16 incident canonicalises.

Downstream consumers¶

The customer-facing catalog — the direct search-and- browse surface.
The Designer experience — a curated browse view.
Full-text search — the app's primary query box.
Zalando Assistant — the conversational discovery surface that "depends on us to fetch and recommend products in real time" (sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search).
Brand partner campaigns — sponsored placements blended via the Promotions Bidding Service; a catalog-search outage is a partner-campaigns outage.

Operational posture¶

Market-group isolation at the Elasticsearch tier. Multiple ES clusters, each serving a subset of countries, such that saturation in one market cluster does not affect other markets. Validated in the 2025-12-16 incident: two of the largest markets co-tenanted on one cluster were saturated; all other market-group clusters remained healthy. Zalando subsequently split the two co-tenant markets into separate clusters during the incident — see patterns/split-cluster-by-market-for-load-isolation.
Shared-across-market failure surface. Countries sharing an ES cluster share blast radius; the number of countries per cluster is a tuning knob that trades cluster operational cost against isolation strength.
Presentation layers as control plane. During incidents, Catalog API and Search API act as the fast- operator-control surface: turn off non-critical calls, reduce parallel queries per request, increase cache TTL, down-sample heavy ML-model integrations. Canonical instance of load shedding at the presentation boundary.

Load-bearing pathology surfaced on 2025-12-16¶

A single pathological caller pattern — ~20–100 req/s of terms aggregations on the SKU field, triggered by an internal application's maintenance workload + processing-logic bug — saturated the coordinator CPU and search thread pool on one market-pair cluster. "Queries that usually took milliseconds were now dragging on for seconds, and some requests were timing out altogether. Users started seeing empty result pages, or pages with just a few items." The pathology escaped every cache layer because the filter+SKU combinations were novel per-request.

Follow-up program:

App-side query limiter with dynamically adjustable thresholds — patterns/application-side-query-limit-with-dynamic-threshold.
Per-client slow-query dashboards via X-Opaque-Id — patterns/per-client-slow-query-dashboard.
Cluster-wide aggregation guardrail via search.max_buckets — patterns/cluster-wide-aggregation-guardrail.
Tighter market-level workload isolation and per-client rate limiting as a runbooks / playbook extension.

Seen in¶

sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search — canonical post-mortem introducing this composite system and its 2025-12-16 self-inflicted-DoS incident.
sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge — pre-launch validation instance. Zalando's Search Quality Framework is the offline LLM-as-a-judge evaluation substrate for this stack during new-market launches. Framework-under-test is the full catalog-search stack (NER + Search API + Base Search) wired with the target-market's locale / translations; the judge scores the stack's result sets on a 0–4 rubric before real users see them. Canonical wiki instance of concepts/pre-launch-market-validation applied to this substrate.

systems/elasticsearch — the storage substrate
systems/zalando-base-search — the ES-cluster wrapper layer
systems/zalando-catalog-api — top presentation layer
systems/zalando-search-api — wraps Base Search, blends Algorithm Gateway + Promotions
systems/zalando-ner-query-builder — intent-parsing + query-building middle layer
systems/zalando-algorithm-gateway — enrichment sidecar
systems/zalando-promotions-bidding — sponsored-content blender
systems/zalando-assistant — conversational surface
systems/zalando-search-quality-framework — offline LLM-as-judge evaluation framework for this stack
systems/zalando-search-query-clustering — upstream NER-clustered test-query generator
concepts/self-inflicted-dos — the failure mode canonicalised from this system
concepts/high-cardinality-aggregation-overload — the per-query mechanism
concepts/pre-launch-market-validation — how new countries are validated on this substrate
companies/zalando