PATTERN Cited by 1 source
Single ValueState over Chained Joins¶
Single ValueState over chained joins is the pattern of
representing an enriched per-key record as one POJO in one
ValueState[T], with each event type updating only its owning
fields in place — rather than storing per-join intermediate
state across N Flink Table-API join operators, each with its own
RocksDB copy.
It is the per-operator-state form chosen inside a
stream
union + KeyedProcessFunction.
Shape¶
// One compact per-SKU POJO holding only what the output needs
case class EnrichmentState(
var price: Double = 0.0,
var stock: Double = 0.0,
var sortingScore: Double = 0.0,
var productState: String = null
)
// One ValueState[EnrichmentState] per key
private var state: ValueState[EnrichmentState] = _
Incoming events pattern-match and update owning fields on
current, then state.update(current) persists once.
Advantages vs chained-join state¶
| Dimension | Chained Table-API joins | Single ValueState |
|---|---|---|
| State copies per key | N−1 (one per join operator) | 1 |
| Left/right sides | Both held in each operator | None — event updates fields |
| TTL management | Per-join TTL config, often conservative | Often "forever" — one record per SKU |
| Storage shape | Multiple RocksDB column families | One POJO, one state |
| Amplification | Multiplicative | Constant |
From the Zalando post (sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions):
"No multiplication: There is no left or right part of the join. When an event arrives, it simply updates the specific field(s) in the existing ValueState object. No TTLs: We keep the state 'forever' to always have the last known value for an SKU. However, because we only store it once, the state is significantly smaller."
Design rules¶
- Dense POJO. Store only the fields that feed the output, not
the full payload of each input stream. Zalando's
EnrichmentStateis four scalars. - Per-type field ownership. Each input stream should own a disjoint subset of the POJO fields so event processing is a simple pattern-match.
- Skip state writes when the update is a no-op. Pair with patterns/event-time-filter-for-state-write-reduction to avoid RocksDB churn.
- One
state.update(current)per event. Mutatecurrentin place during the match; persist once after. - Beware of partial initialisation on first events. If the
event stream for a key arrives out of order, emit logic must
tolerate fields being defaults; Zalando emits an
EnrichedOffer(sku, current)on every processed event and lets downstream handle freshness.
Canonical instance¶
Zalando's Product Offer Enrichment job uses this shape as the
state inside the MultiStreamJoinProcessor. State dropped
235 GB → 56 GB (−76 %) by replacing four chained-join states
with one ValueState[EnrichmentState] per SKU.
Related¶
- systems/flink-datastream-api — the API that exposes
ValueStateinsideKeyedProcessFunction. - systems/apache-flink · systems/flink-multijoin-operator · systems/zalando-product-offer-enrichment.
- concepts/flink-stateful-join-state-amplification — the pathology this pattern replaces.
- concepts/flink-keyed-stream-union — the outer operator graph that hosts this state.
- patterns/stream-union-plus-keyed-process-function — the outer pattern.
- patterns/event-time-filter-for-state-write-reduction — complement.