CONCEPT Cited by 1 source
Light-weight rollup event¶
Definition¶
A light-weight rollup event is a notification emitted from the write path of an event-log-based aggregation system that carries no payload — only enough information to tell the aggregator which key needs re-aggregation, not what changed. The aggregator handles the delta computation later, by reading the underlying event store.
Canonical shape (Netflix Distributed Counter):
(Source: sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction)
Why it matters¶
Three properties fall out of keeping the event light-weight:
- Avoid scanning the event store to find work. The alternative — periodic full scans of the event store to find dirty keys — costs O(total keys) per sweep regardless of activity. Rollup events narrow to O(recently-mutated keys).
- Avoid a parallel source of truth. If the rollup event carried the delta, the delta in the event store and the delta in the rollup event would both be authoritative and could drift under retries/duplication. Keeping the event light-weight forces the aggregator to always re-read the event store, which remains the single source of truth.
- Cheap to drop. Because the event has no payload, losing it doesn't cost data — the next access triggers another rollup event and self-heals the counter's checkpoint. Netflix exploits this: rollup events land in in-memory per-instance queues, and instance crashes losing those queues just delay aggregation rather than lose increments.
Mechanics inside Netflix's Counter¶
- Producers: every
AddCount+ClearCount+GetCountemits a rollup event fire-and-forget (see patterns/fire-and-forget-rollup-trigger). - Consumers: Counter-Rollup server instances run a set of in-memory queues. A fast hash (XXHash) routes the same counter to the same queue deterministically, enabling local per-queue dedup.
- Coalescing: within a rollup window (e.g.
coalesce_ms: 10000), duplicate events for the same counter are absorbed into a Set — the counter is aggregated at most once per window. - Batching: the consumer pulls a batch of counters (e.g. 32) from its Set and runs their TimeSeries aggregation queries in parallel; adaptive back-pressure smooths load on the underlying event store.
Read path also emits¶
An interesting wrinkle: every read (GetCount) also emits a
rollup event. Two benefits:
- Latency: the next read gets a fresher checkpoint because the prior read scheduled a refresh.
- Self-healing: infrequently-accessed counters with a previously-failed rollup get a chance to recover on the next access, so stale counts are bounded by the inter-access gap.
Sibling patterns¶
- Cache-invalidation on write in distributed caches has the same shape: the invalidation carries the key, not the new value; consumers re-read from the source of truth.
- Notification-then-query in pub-sub APIs (e.g. GitHub webhook
pushevents that just name the ref; consumersgit fetchfor the details).
Seen in¶
- sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction
— canonical wiki instance.
{namespace, counter}-only rollup event; in-memory queues; XXHash-routed; Set-coalesced per window; consumed by the batch rollup pipeline.