Skip to content

CONCEPT Cited by 1 source

Stateful stream processing

Definition

Stateful stream processing is a stream computation model where operators maintain mutable state across events — enabling aggregations, joins, sessionization, pattern detection, and any computation that depends on history rather than processing each event in isolation.

Key requirements for stateful stream processors:

  • State primitives — key-value stores, map state, list state, per-key timers.
  • Fault tolerance — periodic checkpointing/snapshotting of state to durable storage so jobs can recover without data loss.
  • State partitioning — state is sharded by key (e.g. device ID, user ID) across parallel instances so each key's state is co-located with its processing.
  • Timer support — processing-time and event-time timers that fire callbacks independent of incoming data, enabling proactive output like heartbeats or timeouts.

Execution models

Engine State model Timers Latency floor
Flink RocksDB-backed keyed state + async incremental checkpoints Event-time + processing-time Milliseconds (continuous)
Spark Structured Streaming (micro-batch) In-memory state store, per-batch checkpoint Processing-time (batch-aligned) Batch interval (100ms–seconds)
Spark Structured Streaming (Real-Time Mode) MapState / ListState via transformWithState Processing-time (sub-second precision) ~432 ms p99 (Source: sources/2026-06-03-databricks-apache-spark-real-time-mode-for-gaming)
Custom (Akka actors) Per-actor state, manual distribution Actor scheduling Application-dependent

Seen in

Last updated · 542 distilled / 1,571 read