CONCEPT Cited by 1 source
Flink Keyed Stream Union¶
Flink keyed stream union is the DataStream-API operator-graph
shape of DataStream.union(a, b, c, …) → keyBy(k) → process(...):
N heterogeneous streams are merged into one typed stream and routed
into a single keyed operator that owns per-key state.
It is the imperative counterpart to a Table-API N-way JOIN and the
shape that makes
stream union +
KeyedProcessFunction and
single
ValueState over chained joins expressible on Flink 1.x without
the
state
amplification pathology of chained join operators.
Preconditions¶
Two structural preconditions must hold for this shape to be worth it:
- A common event supertype. The unioned streams must be
unifiable into one typed stream. Zalando uses a
DataStream[BaseEvent]that all four event classes (OfferEvent,BoostEvent,SponsoredEvent,ProductEvent) extend, then pattern-matches insideprocessElement. - A natural shared key across all streams. Every event must
carry the same join key so
keyBy(k)routes related events to the same operator instance. Zalando's key is SKU — the product identifier shared across all four upstream streams. Without a shared key, this shape would require synthetic keying and lose most of its benefit.
Why it avoids state amplification¶
A single keyed operator holds one state copy per key regardless of how many input streams contributed events for that key. There is no left/right input concept — every event type updates its owning fields on the stored record in place. The multiplicative compounding of chained Table-API joins disappears by construction.
Published upstream equivalent: Flink 2.1's
MultiJoin operator (FLIP-516)
implements the same idea at the planner altitude, emitting a
single multi-way operator rather than a chain.
Seen in¶
- sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions
— Zalando's Product Offer Enrichment pipeline used exactly this
shape (
DataStream.union(offer, boost, sponsored, product) → keyBy(SKU) → MultiStreamJoinProcessor) to collapse a 4-way chained-join job into one keyed operator.
Related¶
- systems/flink-datastream-api — the API where this is written.
- systems/flink-table-api — the API whose N-way chained-join shape this sidesteps.
- concepts/flink-stateful-join-state-amplification — the problem this pattern solves.
- concepts/multi-way-join-operator-flink — the native Flink 2.1 form of the same idea.
- patterns/stream-union-plus-keyed-process-function · patterns/single-valuestate-over-chained-joins — the concrete patterns.