CONCEPT Cited by 1 source
Event-log-based counter¶
Definition¶
An event-log-based counter stores each counter mutation as an individual event in an event store, and computes the counter's value by aggregating events. This contrasts with an in-place counter that stores only the current value and updates it via CAS or locked increment.
The event log preserves the individual provenance of every increment/decrement — enabling audit, recounting, and reset — but requires a background aggregation pipeline + a bucketed partitioning schema to make reads fast and prevent the event log from becoming a hot partition. Canonical wiki instance: Netflix's Eventually-Consistent counter mode. (Source: sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction)
Trade-offs vs. in-place counters¶
| Property | In-place counter | Event-log counter |
|---|---|---|
| Write contention on hot keys | High (CAS/lock) | Low (append-only) |
| Idempotency | Requires external mechanism | Natural via event composite key |
| Audit / recounting | Infeasible (only current value) | Feasible (events retained) |
| Read latency | O(1) point-read | Requires pre-aggregation |
| Reset semantics | Simple overwrite | Requires ordering (LWW + event_time) |
| Storage cost | Low per counter | Higher; governed by retention |
Why not just aggregate on read¶
In the simplest form the event-log is scanned on every GetCount.
Netflix's post lists the drawbacks:
- Read latency scales with event count for a given counter.
- Duplicate work across concurrent readers each computing the same aggregate.
- Wide partitions in Cassandra for hot counters (see concepts/wide-partition-problem).
- Large data footprint without aggressive retention.
The fix is a background sliding-window rollup that continuously aggregates events into a checkpoint; reads serve the checkpoint + optionally a small real-time delta (see patterns/sliding-window-rollup-aggregation).
Schema requirements¶
For the event log to be a workable counter backing store:
- Natural idempotency key via
(event_time, event_id, event_item_key)so retries + hedges don't double-count. - Bucketed event-time partitioning (time-bucket + event-bucket columns) to prevent wide partitions under high throughput.
- Descending event-time order so the most recent
ClearCountis seen first when scanning the aggregation window. - Retention — the event log isn't the long-term store of record; aggregated counts are moved to a cheaper store for audits, and events age out via TTL.
Enables audit + recounting¶
Two properties pre-aggregation loses:
- Auditing — extracting events to an offline system to verify every increment was applied correctly to reach the final value.
- Recounting — if increments need adjustment within a time window, the aggregate has already collapsed the individual events; the log preserves them.
Netflix names both as requirements their event-log-based counter must preserve, and notes this is why they rejected the durable-queue-with-stream-processor-aggregation approach — it pre-aggregates and loses auditing + recounting.
Seen in¶
- sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction — Netflix's Eventually-Consistent counter is canonical. Events land in TimeSeries with composite-key idempotency; bucketed partitioning prevents wide partitions; background rollup into a Cassandra-backed Rollup Store + EVCache checkpoint; TTL retention on the event log.