Skip to content

SYSTEM Cited by 1 source

Zalando Product Offer Enrichment

Product Offer Enrichment is a Apache Flink job owned by Zalando's Search & Browse team that joins four streams into a single enriched per-SKU view powering the website's catalog pages.

Inputs (four streams, keyed on SKU)

Stream Source / meaning
Offer events partner pricing and stock updates
Boost events sorting metadata from
Promotions/Bidding-adjacent
systems
Sponsored events sponsored-products metadata
Product events product master-data state changes

Output

Enriched per-SKU records feeding Catalog Search / Base Search so that catalog browsing pages see current price + stock + boost score + product state + sponsorship in one record.

Implementation history

v1 — Table API & SQL (state-amplifying)

Implemented as a chain of four SQL JOINs with aggregations to compute max timestamps and ranking functions to keep the latest record per join side. Each join operator maintained independent RocksDB state; chain total reached 235–245 GB per application. The resulting hourly savepoint cost made the pipeline operationally unstable: 100 % CPU for 12 min, crash restarts, missed snapshots, 10–20 % KPU overscale margin.

v2 — DataStream API + single keyed operator (current)

Rewritten as DataStream.union(offer, boost, sponsored, product) → keyBy(SKU) → MultiStreamJoinProcessor — a custom KeyedProcessFunction holding a single ValueState[EnrichmentState] per SKU. Incoming events are pattern-matched and update only their owning fields; redundant updates (same price+stock, older boost timestamp, unchanged product state) return before touching state.

Canonical instance of patterns/stream-union-plus-keyed-process-function and patterns/single-valuestate-over-chained-joins.

Impact (sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions):

Metric v1 v2 Δ
State 235 GB 56 GB −76 %
Snapshot duration 11 min 2.5 min −77 %
CPU 100 % spikes ~30 % stable stabilised
AWS cost baseline −13 % saved ~1/8

Constraints

  • Flink 1.20 on AWS Managed Flink. No MultiJoin operator available at the time of the rewrite, so the manual DataStream rewrite was the only option.
  • Key = SKU. The rewrite's elegance depends on the natural shared key across all four streams.
Last updated · 507 distilled / 1,218 read