Skip to content

SYSTEM Cited by 1 source

Flink Table API & SQL

Flink Table API & SQL is Apache Flink's declarative streaming API: users express queries as SQL or as table-builder method chains (select, join, group-by-window), and Flink's query optimizer plans them into a DAG of stateful operators. It is the natural fit for "90 % of use cases — fast, elegant, and maintainable" per Zalando's framing (sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions).

Architectural model

  • Streams as continuously-updating tables. Rows are appended, updated, or retracted; the API hides the append/retract channel behind SQL semantics.
  • Planner-generated operator DAG. Each SQL clause lowers to one or more stateful operators (sources, joins, aggregations, sinks); the planner inserts state-management and watermark handling.
  • Each join operator keeps its own state. In Flink 1.20, a JOIN between two streams is implemented as an independent operator that holds RocksDB-backed state for both sides to handle late arrivals and updates. Chained joins therefore do not share state: operator N stores the N−1 intermediate stream in full so it can re-join with stream N. This is the root of the state amplification pathology in multi-way joins.

Seen in

Relationship to DataStream API and Multi-Way Join Operator

API Shape Join-state behaviour When it fits
Table API & SQL Declarative, planner-chosen operators Independent operator per join — state multiplies in chains 90 % of cases; ad-hoc analytics, small-N joins
DataStream API Imperative operator graph Whatever the user writes; KeyedProcessFunction → one state per key across N streams Per-key temporal / cardinality logic, state-amplification avoidance
MultiJoin operator (Flink 2.1, experimental) Declarative but planner emits a single multi-way operator Single keyed state for all N streams Future replacement for the Table-API N-way case once 2.x lands on managed runtimes
Last updated · 507 distilled / 1,218 read