CONCEPT Cited by 1 source
Stateful stream processing¶
Definition¶
Stateful stream processing is a stream computation model where operators maintain mutable state across events — enabling aggregations, joins, sessionization, pattern detection, and any computation that depends on history rather than processing each event in isolation.
Key requirements for stateful stream processors:
- State primitives — key-value stores, map state, list state, per-key timers.
- Fault tolerance — periodic checkpointing/snapshotting of state to durable storage so jobs can recover without data loss.
- State partitioning — state is sharded by key (e.g. device ID, user ID) across parallel instances so each key's state is co-located with its processing.
- Timer support — processing-time and event-time timers that fire callbacks independent of incoming data, enabling proactive output like heartbeats or timeouts.
Execution models¶
| Engine | State model | Timers | Latency floor |
|---|---|---|---|
| Flink | RocksDB-backed keyed state + async incremental checkpoints | Event-time + processing-time | Milliseconds (continuous) |
| Spark Structured Streaming (micro-batch) | In-memory state store, per-batch checkpoint | Processing-time (batch-aligned) | Batch interval (100ms–seconds) |
| Spark Structured Streaming (Real-Time Mode) | MapState / ListState via transformWithState |
Processing-time (sub-second precision) | ~432 ms p99 (Source: sources/2026-06-03-databricks-apache-spark-real-time-mode-for-gaming) |
| Custom (Akka actors) | Per-actor state, manual distribution | Actor scheduling | Application-dependent |
Seen in¶
- sources/2026-06-03-databricks-apache-spark-real-time-mode-for-gaming — Gaming sessionization tracking 4M concurrent sessions with
transformWithStateoperator:MapState<sessionId, SessionState>partitioned bydeviceId, with 30-second processing-time timers for heartbeat emission.
Related¶
- concepts/micro-batching — the batch-aligned execution model that stateful stream processing can run atop (with latency trade-offs)
- systems/apache-flink — the canonical stateful stream processor
- systems/spark-streaming — Spark's streaming engine, now with Real-Time Mode for sub-second stateful processing
- patterns/timer-driven-heartbeat-emission — a pattern enabled by stateful timers