SYSTEM Cited by 4 sources

Netflix TimeSeries Abstraction¶

Netflix TimeSeries Abstraction is a distributed service on the Data Gateway platform for storing and querying large volumes of temporal event data with low millisecond latencies. Announced in a pre-2024-11-13 post (Netflix TimeSeries Data Abstraction Layer) and now deeply disclosed on this wiki via the 2026-06-03 dynamic-partition-splitting post (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads) which canonicalises both the table-level auto-tuning control loop and the per-ID dynamic partition-splitting pipeline the team built to fight wide partitions on top of Apache Cassandra 4.x at petabyte scale.

The 2026-06-03 post upgrades this page from "named in passing" to a first-class deep disclosure — the substrate's anti-wide-partition machinery is now the most fully described aspect of TimeSeries on the wiki.

Role on the platform¶

TimeSeries is the second mature abstraction on the Data Gateway, sibling to KV DAL, and is composed by downstream services as their event store. The Distributed Counter Abstraction uses TimeSeries directly as its backing event store, demonstrating Netflix's "compose multiple abstraction layers using containers deployed on the same host" Data-Gateway property.

Data model + schema shape (as used by Counter)¶

Each TimeSeries event carries at least:

event_time — the logical timestamp of the event; also doubles as part of the idempotency key.
event_id — per-event identifier (can encode source provenance).
event_item_key — further key scoping within an event.
time_bucket, event_bucket — schema-level partitioning columns that break up otherwise-wide Cassandra partitions (see patterns/bucketed-event-time-partitioning).
Payload columns for the event contents (e.g. the counter delta).

The composite (event_time, event_id, event_item_key) forms a natural idempotency key: retried and hedged writes with the same token produce the same row.

Key operational properties (disclosed via the Counter post)¶

Backed by Cassandra as the underlying event store. Can be configured to work with other persistent stores — the Counter-Rollup Store reuses TimeSeries abstraction but with different tuning.
Wide-partition prevention via bucketing: time_bucket + event_bucket columns prevent high-throughput events for a given identifier from overwhelming a single Cassandra partition. See concepts/wide-partition-problem.
Descending time order: TimeSeries returns events in descending event-time order. Counter Abstraction exploits this for reset semantics — the most recent ClearCount event is seen first when scanning the window.
acceptLimit bounds event skew: incoming events with timestamps beyond this limit (example config: 5 s) are rejected. This doubles as the safety lag under which downstream consumers can treat a time window as an immutable aggregation window.
Retention policies (lifecycle config): events close after close_after seconds and are deleted after delete_after seconds. Example config: 6 days close, 7 days delete. Events aggregated into a downstream rollup store don't need to live in TimeSeries past the audit window.
Range-scan optimisation for low millisecond latency on aggregation workloads — Counter-Rollup issues parallel range queries across time partitions per batch.

Seen in¶

sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads — First wiki canonical deep disclosure of TimeSeries' two anti-wide-partition mechanisms. Author: Rajiv Shringi, Kaidan Fullerton, Oleksii Tkachuk, Kartik Sathyanarayanan. Two distinct remediations canonicalised:

(1) Table-level auto-tuning control loop — "DynamicTimeSliceConfigWorker" — background worker that polls per-table partition-size histograms via Cassandra virtual tables mirroring nodetool tablehistograms, detects observed-vs-target density drift (target window 2–10 MiB), and rewrites the partition strategy used for future Time Slices without touching past slices. Live example fixes over-partitioning (60-second time buckets producing < 10 KB partitions) by widening the time bucket to 7 days (time_bucket interval: 60s -> 604800s). Canonicalised as patterns/auto-tuning-control-loop-on-storage-histograms.

(2) Per-ID dynamic partition splitting — asynchronous pipeline that detects wide partitions on the read path (not write path), splits immutable partitions only, validates with pre/post checksums, and serves reads transparently via a Bloom-filter gate + wide_row metadata table that routes the query to the post-split partitions. The original wide partition is never deleted — preserved as fallback for partial failures and eventual-consistency windows. Canonicalised as concepts/dynamic-partition-splitting / patterns/dynamic-partition-split-async-pipeline / patterns/keep-original-partition-as-fallback-during-split / patterns/bloom-filter-redirect-to-split-partition.

Provisioning is via a Monte-Carlo simulation pipeline (concepts/monte-carlo-simulation-under-uncertainty) over user- provided workload characteristics — links to the open-source service-capacity-modeling time_series.py module.

Mid-stack remedies for wide partitions that don't qualify for dynamic splitting: partial-return on SLO breach (patterns/partial-return-on-slo-breach) for latency-prioritising clients, and manual ID block-listing (dgwts.config.<dataset>.block.Ids).

Phased rollout per dataset with a byte-level shadow-comparison phase as the load-bearing gate (patterns/phased-rollout-of-read-mode + patterns/shadow-mode-bytes-comparison).

Defence-in-depth correctness: pre/post checksum (concepts/checksum-validated-data-migration) + offline Spark verification via Data Bridge + byte-comparison shadow + original-partition-fallback.

Operational outcomes: average wide-partition read latency "from seconds … to low double-digit milliseconds"; tail latency "from several seconds … to around 200 ms or better"; near-zero read timeouts; 500 MB+ partitions paginated successfully while remaining available (gRPC SearchEventRecords example with time_taken: 41.072410142s trading latency for availability).

Future work explicitly named: splitting mutable wide partitions (deferred for surface-area-reduction); re-processing previously failed splits.

sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction — TimeSeries named as the event store underneath the Counter Abstraction. All the schema and operational-property detail on this page comes from this source.
sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer — TimeSeries named in passing as one of "several foundational abstraction services" on the Data Gateway alongside KV DAL.
sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i — second-named graph-substrate face: TimeSeries Abstraction is the optional substrate Netflix Graph Abstraction composes for historical / time-travel views of how the graph evolves over time. The Part-I post only names the composition; full decomposition deferred to the announced Part III of the series. Verbatim: "Optionally, users can plug-in the TimeSeries (TS) Abstraction if they are interested in a historical view of how the graph evolves over time."

systems/netflix-data-gateway
systems/netflix-kv-dal
systems/netflix-distributed-counter
systems/netflix-graph-abstraction
systems/apache-cassandra
systems/kafka — detection-event transport for dynamic partition splitting.
systems/apache-spark — offline split verification engine.
systems/netflix-data-bridge — offline split verification substrate.
patterns/data-abstraction-layer
patterns/namespace-backed-storage-routing
patterns/bucketed-event-time-partitioning
patterns/dynamic-partition-split-async-pipeline · patterns/auto-tuning-control-loop-on-storage-histograms · patterns/keep-original-partition-as-fallback-during-split · patterns/bloom-filter-redirect-to-split-partition · patterns/shadow-mode-bytes-comparison · patterns/phased-rollout-of-read-mode · patterns/partial-return-on-slo-breach — 2026-06-03 dynamic-partition-splitting patterns.
concepts/wide-partition-problem · concepts/dynamic-partition-splitting · concepts/read-side-detection-of-storage-pathology · concepts/immutable-partition · concepts/checksum-validated-data-migration · concepts/bloom-filter · concepts/over-partitioning · concepts/monte-carlo-simulation-under-uncertainty — 2026-06-03 dynamic-partition-splitting concepts.
concepts/idempotency-token
concepts/immutable-aggregation-window
companies/netflix

Netflix TimeSeries Abstraction¶

Role on the platform¶

Data model + schema shape (as used by Counter)¶

Key operational properties (disclosed via the Counter post)¶

Seen in¶

Related¶