Skip to content

SYSTEM Cited by 1 source

Apache Flink

Apache Flink is the JVM-based open-source distributed stream-processing engine that became the de-facto default for stateful event-time stream processing in the 2010s-20s. The defining properties are:

  • Native streaming-first runtime (events are first-class; batch is a bounded-stream special case), distinct from Spark Streaming's micro-batch model.
  • Event-time semantics with watermarks — windows fire based on event timestamps, not processing-time, with built-in watermark handling for out-of-order events.
  • Durable, asynchronous, incremental state checkpoints — operator state persisted to object storage (S3 / GCS / HDFS) so jobs can restart cleanly after failure.
  • Exactly-once semantics across Kafka source → transformation → Kafka sink, via the two-phase-commit transactional writer contract.
  • Rich state backends — RocksDB-backed local state for operators holding GB–TB of keyed state.

Typical role for this wiki

Flink appears whenever a company describes production stream processing with non-trivial state or event-time correctness — fraud detection, real-time feature engineering, CDC-derived materializations, clickstream aggregation, anomaly detection. The alternative that comes up most often in practice is Kafka Streams (library, not engine) for simpler single-cluster deployments, or systems/spark-streaming for teams already heavy on Spark.

Seen in

Last updated · 319 distilled / 1,201 read