Skip to content

CONCEPT Cited by 1 source

Declarative vs Imperative Stream API

The declarative vs imperative stream API tradeoff is the choice between expressing a streaming computation as a query (SQL / relational / planner-optimised) versus as an explicit operator graph (map / keyBy / process / state handling by hand).

The canonical framing from Zalando's 2026-03 Flink post (sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions):

"Flink SQL is perfect for 90% of use cases — it's fast, elegant, and maintainable. But a software engineer's value is in recognizing the remaining 10%: the use cases where the abstraction starts costing too much."

The 90 % case

Declarative APIs (Flink Table API / SQL, Kafka Streams KSQL, Spark Structured Streaming SQL) win on:

  • Expressive density. Joins, aggregations, windowing in a few lines.
  • Planner-authored correctness. Watermarks, late-arrival handling, retractions are handled by the engine.
  • Maintainability. Readable by team members who don't know the runtime internals.
  • Portability across engines. Standard SQL subsets transfer.

The 10 % case — where the abstraction leaks

Three common leak shapes observed in this wiki's corpus:

  1. State amplification across N-way joins. The Flink 1.x Table-API pattern of independent per-join operators compounds state multiplicatively — see concepts/flink-stateful-join-state-amplification. The planner cannot share state across the chain; the user cannot reach in to fix it.
  2. Per-key temporal logic the planner can't represent cheaply. When the application knows that an incoming event whose timestamp <= stored.timestamp is a no-op, the imperative code can return before touching state (patterns/event-time-filter-for-state-write-reduction). In SQL the same semantic requires aggregations + ranking functions the planner evaluates expensively.
  3. Dominant one-shot workload inside the job. When one operator does 80 % of the work (hot-key handling, tail-latency filtering, specialised dedupe), the planner's even-handed optimisation is a penalty, not a benefit.

Warning signs that you're in the 10 %

  • State outgrows the storage allocated per capacity unit (e.g., KPU on AWS Managed Flink).
  • Snapshot cost dominates normal workload (concepts/flink-snapshot-savepoint).
  • The SQL version has aggregations + ranking functions just to recover the semantics the application naturally encodes.
  • Observed cost / throughput is orders-of-magnitude worse than a back-of-envelope estimate.

The imperative rewrite payoff

When the 10 % signature fits, the imperative version is often less verbose, not more. Zalando's remark is worth quoting: "The 'more manual' approach turned out to be even less verbose than the SQL version, because our SQL was quite complex, with aggregations for calculating the maximal timestamps between several parts of the join and with ranking functions for making sure the last record from the same part of the join always wins." Once the temporal logic is "just a few if guards," the declarative version's verbosity paying down that same logic in SQL becomes visible.

Seen in

Last updated · 507 distilled / 1,218 read