Skip to content

PATTERN Cited by 1 source

Single ValueState over Chained Joins

Single ValueState over chained joins is the pattern of representing an enriched per-key record as one POJO in one ValueState[T], with each event type updating only its owning fields in place — rather than storing per-join intermediate state across N Flink Table-API join operators, each with its own RocksDB copy.

It is the per-operator-state form chosen inside a stream union + KeyedProcessFunction.

Shape

// One compact per-SKU POJO holding only what the output needs
case class EnrichmentState(
  var price: Double = 0.0,
  var stock: Double = 0.0,
  var sortingScore: Double = 0.0,
  var productState: String = null
)

// One ValueState[EnrichmentState] per key
private var state: ValueState[EnrichmentState] = _

Incoming events pattern-match and update owning fields on current, then state.update(current) persists once.

Advantages vs chained-join state

Dimension Chained Table-API joins Single ValueState
State copies per key N−1 (one per join operator) 1
Left/right sides Both held in each operator None — event updates fields
TTL management Per-join TTL config, often conservative Often "forever" — one record per SKU
Storage shape Multiple RocksDB column families One POJO, one state
Amplification Multiplicative Constant

From the Zalando post (sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions):

"No multiplication: There is no left or right part of the join. When an event arrives, it simply updates the specific field(s) in the existing ValueState object. No TTLs: We keep the state 'forever' to always have the last known value for an SKU. However, because we only store it once, the state is significantly smaller."

Design rules

  1. Dense POJO. Store only the fields that feed the output, not the full payload of each input stream. Zalando's EnrichmentState is four scalars.
  2. Per-type field ownership. Each input stream should own a disjoint subset of the POJO fields so event processing is a simple pattern-match.
  3. Skip state writes when the update is a no-op. Pair with patterns/event-time-filter-for-state-write-reduction to avoid RocksDB churn.
  4. One state.update(current) per event. Mutate current in place during the match; persist once after.
  5. Beware of partial initialisation on first events. If the event stream for a key arrives out of order, emit logic must tolerate fields being defaults; Zalando emits an EnrichedOffer(sku, current) on every processed event and lets downstream handle freshness.

Canonical instance

Zalando's Product Offer Enrichment job uses this shape as the state inside the MultiStreamJoinProcessor. State dropped 235 GB → 56 GB (−76 %) by replacing four chained-join states with one ValueState[EnrichmentState] per SKU.

Last updated · 507 distilled / 1,218 read