SYSTEM Cited by 1 source
Flink Table API & SQL¶
Flink Table API & SQL is Apache Flink's declarative streaming API: users express queries as SQL or as table-builder method chains (select, join, group-by-window), and Flink's query optimizer plans them into a DAG of stateful operators. It is the natural fit for "90 % of use cases — fast, elegant, and maintainable" per Zalando's framing (sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions).
Architectural model¶
- Streams as continuously-updating tables. Rows are appended, updated, or retracted; the API hides the append/retract channel behind SQL semantics.
- Planner-generated operator DAG. Each SQL clause lowers to one or more stateful operators (sources, joins, aggregations, sinks); the planner inserts state-management and watermark handling.
- Each join operator keeps its own state. In Flink 1.20, a
JOINbetween two streams is implemented as an independent operator that holds RocksDB-backed state for both sides to handle late arrivals and updates. Chained joins therefore do not share state: operator N stores the N−1 intermediate stream in full so it can re-join with stream N. This is the root of the state amplification pathology in multi-way joins.
Seen in¶
- sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions
— Zalando's Product Offer Enrichment job used Table API &
SQL to join four streams (offer, boost, sponsored, product).
Independent-operator state management grew per-application
RocksDB state to 235–245 GB, breaking hourly savepoints
(100 % CPU for 12 min, crash-restart loop, missed snapshots,
10–20 % overscale margin on
KPUs). Team moved to
DataStream API with a
stream
union +
KeyedProcessFunction— state 235 GB → 56 GB (−76 %), cost −13 %. The post's closing framing is not "Table API bad" but "Table API right for 90 %, and the software engineer's value is recognising the 10 % where the abstraction costs too much."
Relationship to DataStream API and Multi-Way Join Operator¶
| API | Shape | Join-state behaviour | When it fits |
|---|---|---|---|
| Table API & SQL | Declarative, planner-chosen operators | Independent operator per join — state multiplies in chains | 90 % of cases; ad-hoc analytics, small-N joins |
| DataStream API | Imperative operator graph | Whatever the user writes; KeyedProcessFunction → one state per key across N streams |
Per-key temporal / cardinality logic, state-amplification avoidance |
MultiJoin operator (Flink 2.1, experimental) |
Declarative but planner emits a single multi-way operator | Single keyed state for all N streams | Future replacement for the Table-API N-way case once 2.x lands on managed runtimes |
Related¶
- systems/apache-flink — parent engine.
- systems/flink-datastream-api — imperative sibling; the rewrite target when Table API's operator independence becomes costly.
- systems/flink-multijoin-operator — Flink 2.1's experimental native multi-way join operator (FLIP-516, 2025-05).
- concepts/declarative-vs-imperative-stream-api — the general tradeoff axis.
- concepts/flink-stateful-join-state-amplification — the specific state pathology this API's operator model produces for N-way chains.