Skip to content

CONCEPT Cited by 1 source

Flink Keyed Stream Union

Flink keyed stream union is the DataStream-API operator-graph shape of DataStream.union(a, b, c, …) → keyBy(k) → process(...): N heterogeneous streams are merged into one typed stream and routed into a single keyed operator that owns per-key state.

It is the imperative counterpart to a Table-API N-way JOIN and the shape that makes stream union + KeyedProcessFunction and single ValueState over chained joins expressible on Flink 1.x without the state amplification pathology of chained join operators.

Preconditions

Two structural preconditions must hold for this shape to be worth it:

  1. A common event supertype. The unioned streams must be unifiable into one typed stream. Zalando uses a DataStream[BaseEvent] that all four event classes (OfferEvent, BoostEvent, SponsoredEvent, ProductEvent) extend, then pattern-matches inside processElement.
  2. A natural shared key across all streams. Every event must carry the same join key so keyBy(k) routes related events to the same operator instance. Zalando's key is SKU — the product identifier shared across all four upstream streams. Without a shared key, this shape would require synthetic keying and lose most of its benefit.

Why it avoids state amplification

A single keyed operator holds one state copy per key regardless of how many input streams contributed events for that key. There is no left/right input concept — every event type updates its owning fields on the stored record in place. The multiplicative compounding of chained Table-API joins disappears by construction.

Published upstream equivalent: Flink 2.1's MultiJoin operator (FLIP-516) implements the same idea at the planner altitude, emitting a single multi-way operator rather than a chain.

Seen in

Last updated · 507 distilled / 1,218 read