SYSTEM Cited by 4 sources
Netflix TimeSeries Abstraction¶
Netflix TimeSeries Abstraction is a distributed service on the Data Gateway platform for storing and querying large volumes of temporal event data with low millisecond latencies. Announced in a pre-2024-11-13 post (Netflix TimeSeries Data Abstraction Layer) and now deeply disclosed on this wiki via the 2026-06-03 dynamic-partition-splitting post (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads) which canonicalises both the table-level auto-tuning control loop and the per-ID dynamic partition-splitting pipeline the team built to fight wide partitions on top of Apache Cassandra 4.x at petabyte scale.
The 2026-06-03 post upgrades this page from "named in passing" to a first-class deep disclosure — the substrate's anti-wide-partition machinery is now the most fully described aspect of TimeSeries on the wiki.
Role on the platform¶
TimeSeries is the second mature abstraction on the Data Gateway, sibling to KV DAL, and is composed by downstream services as their event store. The Distributed Counter Abstraction uses TimeSeries directly as its backing event store, demonstrating Netflix's "compose multiple abstraction layers using containers deployed on the same host" Data-Gateway property.
Data model + schema shape (as used by Counter)¶
Each TimeSeries event carries at least:
event_time— the logical timestamp of the event; also doubles as part of the idempotency key.event_id— per-event identifier (can encode source provenance).event_item_key— further key scoping within an event.time_bucket,event_bucket— schema-level partitioning columns that break up otherwise-wide Cassandra partitions (see patterns/bucketed-event-time-partitioning).- Payload columns for the event contents (e.g. the counter delta).
The composite (event_time, event_id, event_item_key) forms a
natural idempotency key: retried and
hedged writes with the same token produce the same row.
Key operational properties (disclosed via the Counter post)¶
- Backed by Cassandra as the underlying event store. Can be configured to work with other persistent stores — the Counter-Rollup Store reuses TimeSeries abstraction but with different tuning.
- Wide-partition prevention via bucketing:
time_bucket+event_bucketcolumns prevent high-throughput events for a given identifier from overwhelming a single Cassandra partition. See concepts/wide-partition-problem. - Descending time order: TimeSeries returns events in descending
event-time order. Counter Abstraction exploits this for reset
semantics — the most recent
ClearCountevent is seen first when scanning the window. acceptLimitbounds event skew: incoming events with timestamps beyond this limit (example config:5 s) are rejected. This doubles as the safety lag under which downstream consumers can treat a time window as an immutable aggregation window.- Retention policies (lifecycle config): events close after
close_afterseconds and are deleted afterdelete_afterseconds. Example config: 6 days close, 7 days delete. Events aggregated into a downstream rollup store don't need to live in TimeSeries past the audit window. - Range-scan optimisation for low millisecond latency on aggregation workloads — Counter-Rollup issues parallel range queries across time partitions per batch.
Seen in¶
- sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads — First wiki canonical deep disclosure of TimeSeries' two anti-wide-partition mechanisms. Author: Rajiv Shringi, Kaidan Fullerton, Oleksii Tkachuk, Kartik Sathyanarayanan. Two distinct remediations canonicalised:
(1) Table-level auto-tuning control loop — "DynamicTimeSliceConfigWorker" —
background worker that polls per-table partition-size histograms
via Cassandra virtual tables mirroring nodetool tablehistograms,
detects observed-vs-target density drift (target window 2–10 MiB),
and rewrites the partition strategy used for future Time Slices
without touching past slices. Live example fixes
over-partitioning (60-second time
buckets producing < 10 KB partitions) by widening the time bucket
to 7 days (time_bucket interval: 60s -> 604800s). Canonicalised
as patterns/auto-tuning-control-loop-on-storage-histograms.
(2) Per-ID dynamic partition splitting — asynchronous pipeline
that detects wide partitions on the read path (not write path),
splits immutable partitions only, validates with
pre/post checksums, and serves reads transparently via a
Bloom-filter gate + wide_row metadata table that routes the
query to the post-split partitions. The original wide partition is
never deleted — preserved as fallback for partial failures and
eventual-consistency windows. Canonicalised as
concepts/dynamic-partition-splitting /
patterns/dynamic-partition-split-async-pipeline /
patterns/keep-original-partition-as-fallback-during-split /
patterns/bloom-filter-redirect-to-split-partition.
Provisioning is via a Monte-Carlo simulation pipeline (concepts/monte-carlo-simulation-under-uncertainty) over user- provided workload characteristics — links to the open-source service-capacity-modeling time_series.py module.
Mid-stack remedies for wide partitions that don't qualify for
dynamic splitting: partial-return on SLO breach
(patterns/partial-return-on-slo-breach) for latency-prioritising
clients, and manual ID block-listing
(dgwts.config.<dataset>.block.Ids).
Phased rollout per dataset with a byte-level shadow-comparison phase as the load-bearing gate (patterns/phased-rollout-of-read-mode + patterns/shadow-mode-bytes-comparison).
Defence-in-depth correctness: pre/post checksum (concepts/checksum-validated-data-migration) + offline Spark verification via Data Bridge + byte-comparison shadow + original-partition-fallback.
Operational outcomes: average wide-partition read latency
"from seconds … to low double-digit milliseconds"; tail latency
"from several seconds … to around 200 ms or better"; near-zero
read timeouts; 500 MB+ partitions paginated successfully
while remaining available (gRPC SearchEventRecords example with
time_taken: 41.072410142s trading latency for availability).
Future work explicitly named: splitting mutable wide partitions (deferred for surface-area-reduction); re-processing previously failed splits.
- sources/2024-11-13-netflix-netflixs-distributed-counter-abstraction — TimeSeries named as the event store underneath the Counter Abstraction. All the schema and operational-property detail on this page comes from this source.
- sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer — TimeSeries named in passing as one of "several foundational abstraction services" on the Data Gateway alongside KV DAL.
- sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i — second-named graph-substrate face: TimeSeries Abstraction is the optional substrate Netflix Graph Abstraction composes for historical / time-travel views of how the graph evolves over time. The Part-I post only names the composition; full decomposition deferred to the announced Part III of the series. Verbatim: "Optionally, users can plug-in the TimeSeries (TS) Abstraction if they are interested in a historical view of how the graph evolves over time."
Related¶
- systems/netflix-data-gateway
- systems/netflix-kv-dal
- systems/netflix-distributed-counter
- systems/netflix-graph-abstraction
- systems/apache-cassandra
- systems/kafka — detection-event transport for dynamic partition splitting.
- systems/apache-spark — offline split verification engine.
- systems/netflix-data-bridge — offline split verification substrate.
- patterns/data-abstraction-layer
- patterns/namespace-backed-storage-routing
- patterns/bucketed-event-time-partitioning
- patterns/dynamic-partition-split-async-pipeline · patterns/auto-tuning-control-loop-on-storage-histograms · patterns/keep-original-partition-as-fallback-during-split · patterns/bloom-filter-redirect-to-split-partition · patterns/shadow-mode-bytes-comparison · patterns/phased-rollout-of-read-mode · patterns/partial-return-on-slo-breach — 2026-06-03 dynamic-partition-splitting patterns.
- concepts/wide-partition-problem · concepts/dynamic-partition-splitting · concepts/read-side-detection-of-storage-pathology · concepts/immutable-partition · concepts/checksum-validated-data-migration · concepts/bloom-filter · concepts/over-partitioning · concepts/monte-carlo-simulation-under-uncertainty — 2026-06-03 dynamic-partition-splitting concepts.
- concepts/idempotency-token
- concepts/immutable-aggregation-window
- companies/netflix