Skip to content

CONCEPT Cited by 1 source

Multi-Way Join Operator (Flink)

A multi-way join operator in Flink is a single keyed operator that joins N streams at once, holding one shared state per key, rather than the Flink 1.x pattern of chaining N−1 binary join operators that each maintain their own state.

The Flink project formalised this in FLIP-516 (Multi-Way Join Operator, 2025-05-19) and shipped it as experimental in Flink 2.1 — see systems/flink-multijoin-operator.

Why it matters

The shape directly addresses Flink stateful join state amplification: instead of cloning the relevant data once per binary join in a chain, the operator holds one record per key for the entire multi-way join. Published upstream benchmarks report 2× to over 100×+ increase in processed records and 3× to over 1000×+ smaller state versus the chained-join baseline.

Relationship to the hand-rolled DataStream form

The Zalando team reimplemented the same idea on the DataStream API — via stream union + keyBy(SKU) + a custom KeyedProcessFunction holding a single ValueState — because AWS Managed Flink did not yet offer Flink 2.x. The shape is structurally the same: keyed stream union into one operator, one state copy per key, no intermediate state. The DataStream form is more verbose and less planner-friendly but works on any Flink version.

Comparison:

Aspect Chained binary joins (Flink 1.x Table API) Hand-rolled KeyedProcessFunction Native MultiJoin (Flink 2.1)
Number of operators N−1 1 1
State copies per key N−1 1 1
API Declarative SQL Imperative DataStream Declarative SQL, planner-emitted
Flink version needed 1.x 1.x 2.1+
Join types supported All Whatever you code INNER / LEFT (experimental; RIGHT coming)

Caveats

Flink docs explicitly disclaim: "This is currently in an experimental state — there are open optimizations and breaking changes might be implemented in this version. We currently support only streaming INNER/LEFT joins. Support for RIGHT joins will be added soon."

Seen in

Last updated · 507 distilled / 1,218 read