Skip to content

SYSTEM Cited by 4 sources

Netflix TimeSeries Abstraction

Netflix TimeSeries Abstraction is a distributed service on the Data Gateway platform for storing and querying large volumes of temporal event data with low millisecond latencies. Announced in a pre-2024-11-13 post (Netflix TimeSeries Data Abstraction Layer) and now deeply disclosed on this wiki via the 2026-06-03 dynamic-partition-splitting post (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads) which canonicalises both the table-level auto-tuning control loop and the per-ID dynamic partition-splitting pipeline the team built to fight wide partitions on top of Apache Cassandra 4.x at petabyte scale.

The 2026-06-03 post upgrades this page from "named in passing" to a first-class deep disclosure — the substrate's anti-wide-partition machinery is now the most fully described aspect of TimeSeries on the wiki.

Role on the platform

TimeSeries is the second mature abstraction on the Data Gateway, sibling to KV DAL, and is composed by downstream services as their event store. The Distributed Counter Abstraction uses TimeSeries directly as its backing event store, demonstrating Netflix's "compose multiple abstraction layers using containers deployed on the same host" Data-Gateway property.

Data model + schema shape (as used by Counter)

Each TimeSeries event carries at least:

  • event_time — the logical timestamp of the event; also doubles as part of the idempotency key.
  • event_id — per-event identifier (can encode source provenance).
  • event_item_key — further key scoping within an event.
  • time_bucket, event_bucket — schema-level partitioning columns that break up otherwise-wide Cassandra partitions (see patterns/bucketed-event-time-partitioning).
  • Payload columns for the event contents (e.g. the counter delta).

The composite (event_time, event_id, event_item_key) forms a natural idempotency key: retried and hedged writes with the same token produce the same row.

Key operational properties (disclosed via the Counter post)

  • Backed by Cassandra as the underlying event store. Can be configured to work with other persistent stores — the Counter-Rollup Store reuses TimeSeries abstraction but with different tuning.
  • Wide-partition prevention via bucketing: time_bucket + event_bucket columns prevent high-throughput events for a given identifier from overwhelming a single Cassandra partition. See concepts/wide-partition-problem.
  • Descending time order: TimeSeries returns events in descending event-time order. Counter Abstraction exploits this for reset semantics — the most recent ClearCount event is seen first when scanning the window.
  • acceptLimit bounds event skew: incoming events with timestamps beyond this limit (example config: 5 s) are rejected. This doubles as the safety lag under which downstream consumers can treat a time window as an immutable aggregation window.
  • Retention policies (lifecycle config): events close after close_after seconds and are deleted after delete_after seconds. Example config: 6 days close, 7 days delete. Events aggregated into a downstream rollup store don't need to live in TimeSeries past the audit window.
  • Range-scan optimisation for low millisecond latency on aggregation workloads — Counter-Rollup issues parallel range queries across time partitions per batch.

Seen in

(1) Table-level auto-tuning control loop"DynamicTimeSliceConfigWorker" — background worker that polls per-table partition-size histograms via Cassandra virtual tables mirroring nodetool tablehistograms, detects observed-vs-target density drift (target window 2–10 MiB), and rewrites the partition strategy used for future Time Slices without touching past slices. Live example fixes over-partitioning (60-second time buckets producing < 10 KB partitions) by widening the time bucket to 7 days (time_bucket interval: 60s -> 604800s). Canonicalised as patterns/auto-tuning-control-loop-on-storage-histograms.

(2) Per-ID dynamic partition splitting — asynchronous pipeline that detects wide partitions on the read path (not write path), splits immutable partitions only, validates with pre/post checksums, and serves reads transparently via a Bloom-filter gate + wide_row metadata table that routes the query to the post-split partitions. The original wide partition is never deleted — preserved as fallback for partial failures and eventual-consistency windows. Canonicalised as concepts/dynamic-partition-splitting / patterns/dynamic-partition-split-async-pipeline / patterns/keep-original-partition-as-fallback-during-split / patterns/bloom-filter-redirect-to-split-partition.

Provisioning is via a Monte-Carlo simulation pipeline (concepts/monte-carlo-simulation-under-uncertainty) over user- provided workload characteristics — links to the open-source service-capacity-modeling time_series.py module.

Mid-stack remedies for wide partitions that don't qualify for dynamic splitting: partial-return on SLO breach (patterns/partial-return-on-slo-breach) for latency-prioritising clients, and manual ID block-listing (dgwts.config.<dataset>.block.Ids).

Phased rollout per dataset with a byte-level shadow-comparison phase as the load-bearing gate (patterns/phased-rollout-of-read-mode + patterns/shadow-mode-bytes-comparison).

Defence-in-depth correctness: pre/post checksum (concepts/checksum-validated-data-migration) + offline Spark verification via Data Bridge + byte-comparison shadow + original-partition-fallback.

Operational outcomes: average wide-partition read latency "from seconds … to low double-digit milliseconds"; tail latency "from several seconds … to around 200 ms or better"; near-zero read timeouts; 500 MB+ partitions paginated successfully while remaining available (gRPC SearchEventRecords example with time_taken: 41.072410142s trading latency for availability).

Future work explicitly named: splitting mutable wide partitions (deferred for surface-area-reduction); re-processing previously failed splits.

Last updated · 542 distilled / 1,571 read