SYSTEM Cited by 1 source

Flink Table API & SQL¶

Flink Table API & SQL is Apache Flink's declarative streaming API: users express queries as SQL or as table-builder method chains (select, join, group-by-window), and Flink's query optimizer plans them into a DAG of stateful operators. It is the natural fit for "90 % of use cases — fast, elegant, and maintainable" per Zalando's framing (sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions).

Architectural model¶

Streams as continuously-updating tables. Rows are appended, updated, or retracted; the API hides the append/retract channel behind SQL semantics.
Planner-generated operator DAG. Each SQL clause lowers to one or more stateful operators (sources, joins, aggregations, sinks); the planner inserts state-management and watermark handling.
Each join operator keeps its own state. In Flink 1.20, a JOIN between two streams is implemented as an independent operator that holds RocksDB-backed state for both sides to handle late arrivals and updates. Chained joins therefore do not share state: operator N stores the N−1 intermediate stream in full so it can re-join with stream N. This is the root of the state amplification pathology in multi-way joins.

sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions — Zalando's Product Offer Enrichment job used Table API & SQL to join four streams (offer, boost, sponsored, product). Independent-operator state management grew per-application RocksDB state to 235–245 GB, breaking hourly savepoints (100 % CPU for 12 min, crash-restart loop, missed snapshots, 10–20 % overscale margin on KPUs). Team moved to DataStream API with a stream union + KeyedProcessFunction — state 235 GB → 56 GB (−76 %), cost −13 %. The post's closing framing is not "Table API bad" but "Table API right for 90 %, and the software engineer's value is recognising the 10 % where the abstraction costs too much."

API	Shape	Join-state behaviour	When it fits
Table API & SQL	Declarative, planner-chosen operators	Independent operator per join — state multiplies in chains	90 % of cases; ad-hoc analytics, small-N joins
DataStream API	Imperative operator graph	Whatever the user writes; `KeyedProcessFunction` → one state per key across N streams	Per-key temporal / cardinality logic, state-amplification avoidance
`MultiJoin` operator (Flink 2.1, experimental)	Declarative but planner emits a single multi-way operator	Single keyed state for all N streams	Future replacement for the Table-API N-way case once 2.x lands on managed runtimes

systems/apache-flink — parent engine.
systems/flink-datastream-api — imperative sibling; the rewrite target when Table API's operator independence becomes costly.
systems/flink-multijoin-operator — Flink 2.1's experimental native multi-way join operator (FLIP-516, 2025-05).
concepts/declarative-vs-imperative-stream-api — the general tradeoff axis.
concepts/flink-stateful-join-state-amplification — the specific state pathology this API's operator model produces for N-way chains.