Skip to content

PATTERN Cited by 1 source

Reactive-plus-proactive stream processing

Definition

A stateful stream processing pattern where a single processor class implements two distinct execution paths:

  1. Reactive path — responds to incoming events (data-driven): e.g., processing session starts and ends as they arrive from Kafka.
  2. Proactive path — fires on a timer schedule independent of input data: e.g., emitting heartbeats, checking timeouts, producing derived signals.

This separates input-triggered logic from time-triggered logic within one stateful operator, avoiding the need for external cron schedulers or secondary systems.

Shape (Spark transformWithState)

class SessionizationProcessor extends StatefulProcessor {
    // Reactive: fires on each input record
    handleInputRows(rows) { /* session start/end logic */ }

    // Proactive: fires on timer expiry (every 30s)
    handleExpiredTimer(timerKey) { /* heartbeat / timeout logic */ }
}

When to use

  • Workloads requiring both immediate response to events AND scheduled output (heartbeats, SLA checks, timeouts).
  • Session-lifecycle tracking where downstream needs periodic updates even when nothing is happening upstream.
  • Any scenario with high output amplification — proactive path dominates output volume.

Trade-offs

  • Pro: Single processor manages entire entity lifecycle — simpler than coordinating an event-driven service + a separate timer/cron service.
  • Pro: State is shared between both paths — the timer can access the same session state that input processing writes.
  • Con: Timer granularity depends on the engine's execution model. Micro-batch mode aligns timers to batch boundaries; Real-Time Mode gives sub-second precision.
  • Con: Debugging is harder — output may come from either path, requiring correlation of timer-driven vs input-driven events.

Seen in

Last updated · 542 distilled / 1,571 read