Skip to content

PATTERN Cited by 1 source

Fire-and-forget rollup trigger

Problem

Writes to an event-log-based aggregation system need to schedule a background rollup for the affected key, but must not block on the rollup — the write path is latency-critical and the rollup tier runs asynchronously. Also, losing the trigger must not lose data — the underlying event store is already durable.

Pattern

After durably persisting the event, emit a light-weight rollup event to the rollup tier fire-and-forget. The write completes the moment the event is durable in the event store; the rollup trigger is a best-effort signal saying "this key has changed, please re-aggregate".

Canonical shape (Netflix Distributed Counter):

  1. Client calls AddCount(namespace, counter, delta, token).
  2. Service writes event to TimeSeries durably.
  3. Service updates last-write-timestamp on the Rollup Store (Cassandra USING TIMESTAMP = event's event_time) — this is LWW and also durable.
  4. Service sends {namespace, counter} to the Rollup tier fire- and-forget (no ACK, no retry).
  5. Returns success to client.

The rollup trigger is handled asynchronously by the Counter-Rollup server tier — see patterns/sliding-window-rollup-aggregation.

Why fire-and-forget works

Three properties make the trigger safe to drop:

  1. The event is already durable in the event store — the primary source of truth isn't the trigger, it's the event log.
  2. Reads emit triggers too — Netflix's GetCount also fires a rollup event, so an infrequently-accessed counter whose write- path trigger was lost self-heals on the next read.
  3. last-write-timestamp drives rollup circulation. Counters whose pending events haven't been aggregated stay in circulation until they catch up, giving the rollup tier independent signal besides the in-memory trigger queue.

Drop rate in steady state is low because the trigger is an in-process/in-cluster delivery — but even under instance crash, the only cost is delayed aggregation. No data loss, no double- counting.

Where it can bite

Netflix's post names three caveats:

  • In-memory queues lose triggers on instance crash — first- version Counter uses simple in-memory queues "to reduce provisioning complexity, save on infrastructure costs, and make rebalancing fairly straightforward." Named as future work: durable queues + rollup handoffs.
  • Infrequently-accessed counters can stay stale longer, because there's no read to self-heal.
  • No observability of dropped triggers — fire-and-forget by definition has no ACK.

When to use

  • Event-log-based aggregation where the event store is already the durable record.
  • Rollup tier is sized for bursts and doesn't need strong ordering of triggers.
  • Bounded-staleness reads are acceptable.

When NOT to use

  • Rollup must run before the write is acknowledged (e.g. a contract requires the aggregate to reflect the new value immediately) — prefer synchronous aggregation or an in-place counter with CAS.
  • No independent signal (read-triggered, last-write-timestamp) exists for self-healing — you'd risk permanent staleness on dropped triggers.

Seen in

Last updated · 319 distilled / 1,201 read