SYSTEM Cited by 1 source
Apache Flink¶
Apache Flink is the JVM-based open-source distributed stream-processing engine that became the de-facto default for stateful event-time stream processing in the 2010s-20s. The defining properties are:
- Native streaming-first runtime (events are first-class; batch is a bounded-stream special case), distinct from Spark Streaming's micro-batch model.
- Event-time semantics with watermarks — windows fire based on event timestamps, not processing-time, with built-in watermark handling for out-of-order events.
- Durable, asynchronous, incremental state checkpoints — operator state persisted to object storage (S3 / GCS / HDFS) so jobs can restart cleanly after failure.
- Exactly-once semantics across Kafka source → transformation → Kafka sink, via the two-phase-commit transactional writer contract.
- Rich state backends — RocksDB-backed local state for operators holding GB–TB of keyed state.
Typical role for this wiki¶
Flink appears whenever a company describes production stream processing with non-trivial state or event-time correctness — fraud detection, real-time feature engineering, CDC-derived materializations, clickstream aggregation, anomaly detection. The alternative that comes up most often in practice is Kafka Streams (library, not engine) for simpler single-cluster deployments, or systems/spark-streaming for teams already heavy on Spark.
Seen in¶
- sources/2026-01-06-lyft-feature-store-architecture-optimization-and-evolution
— Lyft's
Feature Store streaming lane is
Flink-first: customer Flink applications read from
Kafka (or sometimes
Kinesis), transform +
create metadata + format values, then sink payloads into a
dedicated central Flink ingest application (
spfeaturesingest) that owns (de)serialization anddsfeaturesWRITE API interaction. The two-tier Flink architecture (customer apps + central ingest app) is the mechanism that preserves uniform metadata across all streaming producers.
Related¶
- systems/kafka — the dominant event-source Flink consumes.
- systems/amazon-kinesis-data-streams — alternative event source.
- systems/kafka-streams — library-based alternative.
- systems/spark-streaming — micro-batch alternative.
- concepts/feature-freshness — the streaming lane's reason to exist in a feature store.
- patterns/hybrid-batch-streaming-ingestion