PATTERN Cited by 1 source
Fire-and-forget rollup trigger¶
Problem¶
Writes to an event-log-based aggregation system need to schedule a background rollup for the affected key, but must not block on the rollup — the write path is latency-critical and the rollup tier runs asynchronously. Also, losing the trigger must not lose data — the underlying event store is already durable.
Pattern¶
After durably persisting the event, emit a light-weight rollup event to the rollup tier fire-and-forget. The write completes the moment the event is durable in the event store; the rollup trigger is a best-effort signal saying "this key has changed, please re-aggregate".
Canonical shape (Netflix Distributed Counter):
- Client calls
AddCount(namespace, counter, delta, token). - Service writes event to TimeSeries durably.
- Service updates
last-write-timestampon the Rollup Store (CassandraUSING TIMESTAMP= event'sevent_time) — this is LWW and also durable. - Service sends
{namespace, counter}to the Rollup tier fire- and-forget (no ACK, no retry). - Returns success to client.
The rollup trigger is handled asynchronously by the Counter-Rollup server tier — see patterns/sliding-window-rollup-aggregation.
Why fire-and-forget works¶
Three properties make the trigger safe to drop:
- The event is already durable in the event store — the primary source of truth isn't the trigger, it's the event log.
- Reads emit triggers too — Netflix's
GetCountalso fires a rollup event, so an infrequently-accessed counter whose write- path trigger was lost self-heals on the next read. last-write-timestampdrives rollup circulation. Counters whose pending events haven't been aggregated stay in circulation until they catch up, giving the rollup tier independent signal besides the in-memory trigger queue.
Drop rate in steady state is low because the trigger is an in-process/in-cluster delivery — but even under instance crash, the only cost is delayed aggregation. No data loss, no double- counting.
Where it can bite¶
Netflix's post names three caveats:
- In-memory queues lose triggers on instance crash — first- version Counter uses simple in-memory queues "to reduce provisioning complexity, save on infrastructure costs, and make rebalancing fairly straightforward." Named as future work: durable queues + rollup handoffs.
- Infrequently-accessed counters can stay stale longer, because there's no read to self-heal.
- No observability of dropped triggers — fire-and-forget by definition has no ACK.
When to use¶
- Event-log-based aggregation where the event store is already the durable record.
- Rollup tier is sized for bursts and doesn't need strong ordering of triggers.
- Bounded-staleness reads are acceptable.
When NOT to use¶
- Rollup must run before the write is acknowledged (e.g. a contract requires the aggregate to reflect the new value immediately) — prefer synchronous aggregation or an in-place counter with CAS.
- No independent signal (read-triggered, last-write-timestamp) exists for self-healing — you'd risk permanent staleness on dropped triggers.
Seen in¶
- sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction — canonical wiki instance. Post-durability rollup trigger on writes + read-triggered rollups + last-write-timestamp as independent signal.