SYSTEM Cited by 1 source
Flink MultiJoin Operator (FLIP-516)¶
MultiJoin Operator is an experimental Flink 2.1 feature (introduced via FLIP-516, 2025-05-19 ) that plans an N-way streaming join into a single keyed operator holding one shared state, rather than the Flink 1.x model of chained independent join operators each keeping their own RocksDB copy of both inputs.
Motivation¶
The problem it solves is exactly
Flink stateful
join state amplification: in Flink 1.20's
Table API, each JOIN in a chain
materialises its own full state for late/updated-arrival
correctness, so four joins carry four copies of the keyspace's
relevant fields. MultiJoin unifies them under a single operator,
keyed on the common join key, and stores one record per key.
Flink docs disclaimer¶
"This is currently in an experimental state — there are open optimizations and breaking changes might be implemented in this version. We currently support only streaming INNER/LEFT joins. Support for RIGHT joins will be added soon."
Published benchmark¶
Flink docs report MultiJoin vs chained joins:
- 2× to over 100×+ increase in processed records.
- 3× to over 1000×+ smaller state.
The ratio range reflects how state amplification scales with the number of joined streams and per-stream cardinality.
Seen in¶
- sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions
— Zalando cites FLIP-516 as the structural idea they
reimplemented by hand in DataStream API because AWS Managed
Flink did not yet offer Flink 2.x: "We reimplemented the same
idea: keyed state, one operator for all streams, no
intermediate state. … This feature alone will be worth the
wait when we get there, but until then, we're covered by our
home-baked solution." The Zalando
stream
union +
KeyedProcessFunctionshape is the imperative DataStream-API equivalent of what MultiJoin does at the Table-API / planner altitude.
Related¶
- systems/apache-flink — parent engine; MultiJoin ships in 2.1 as experimental.
- systems/flink-table-api — the API layer MultiJoin plugs into (planner emits it instead of a chain).
- systems/flink-datastream-api — the lower-level API where the same shape can be written manually on older Flink versions.
- concepts/multi-way-join-operator-flink — concept page with canonical framing.
- concepts/flink-stateful-join-state-amplification — the pathology it addresses.
- patterns/single-valuestate-over-chained-joins — the hand-rolled DataStream equivalent.