SYSTEM Cited by 3 sources

Netflix KV Data Abstraction Layer (KV DAL)¶

Netflix KV DAL is the mature, production-exemplar Data Abstraction Layer built on Netflix's Data Gateway Platform. It is a gRPC service that sits between Netflix microservices and multiple storage engines (Cassandra, EVCache, DynamoDB, RocksDB) and exposes a uniform two-level-map data model over four CRUD APIs. It powers streaming metadata, user profiles, Netflix's push-messaging registry (Pushy), and large-scale impression persistence (Bulldozer). (Source: sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer)

Problem it solves¶

Netflix ran many microservices against many online key-value and wide-column stores (Cassandra, EVCache, DynamoDB, RocksDB) and kept paying three recurring taxes:

Developers mis-reasoning about consistency/durability/perf across stores deployed globally.
Same data-access pitfalls re-learned per-team — tail latency, idempotency, wide partitions, fat columns, slow pagination.
Tight coupling with evolving native DB APIs — backward- incompatible client-library changes dragged every microservice with them.

KV DAL centralizes the solutions in one service: developers describe a data problem via a namespace; the service routes to the right backend(s), handles idempotency, pagination, compression, large values, and tombstones.

Data model¶

HashMap<String, SortedMap<Bytes, Bytes>>:

Level 1: hashed string id (partition key on Cassandra).
Level 2: sorted key → value map of bytes (clustering column on Cassandra).

One structure covers HashMaps (id → {"" → value}), Sets (id → {key → ""}), Records, and Events. See concepts/two-level-map-kv-model.

Canonical Cassandra DDL:

CREATE TABLE IF NOT EXISTS <ns>.<table> (
  id             text,
  key            blob,
  value          blob,
  value_metadata blob,
  PRIMARY KEY (id, key))
WITH CLUSTERING ORDER BY (key <ASC|DESC>)

Items carry explicit chunk and metadata fields:

message Item ( Bytes key, Bytes value, Metadata metadata, Integer chunk )

Namespace: one unit of logical + physical config¶

A namespace bundles:

Where data lives (which clusters / caches / regions).
Access-pattern config — consistency_scope, consistency_target, cache TTLs, SLO targets.
Which backends serve which tier (primary, cache, chunk).

Example from the post (ngsegment):

persistence_configuration:
  - id: PRIMARY_STORAGE
    physical_storage:
      type: CASSANDRA
      cluster: cassandra_kv_ngsegment
      dataset: ngsegment
      table:   ngsegment
      regions: [us-east-1]
      config:
        consistency_scope:  LOCAL
        consistency_target: READ_YOUR_WRITES
  - id: CACHE
    physical_storage:
      type: CACHE
      cluster: evcache_kv_ngsegment
    config:
      default_cache_ttl: 180s

Durable Cassandra primary + EVCache cache tier in one namespace. Developers pick the namespace; the routing + combination logic lives in KV DAL. See patterns/namespace-backed-storage-routing.

Four CRUD APIs¶

`PutItems` — upsert with idempotency¶

message PutItemRequest (
  IdempotencyToken idempotency_token,
  string           namespace,
  string           id,
  List<Item>       items
)

Idempotency token = (generation_time, 128-bit token). Chunked payloads are staged then committed with metadata (number of chunks) — all tied atomically by one token. See concepts/idempotency-token and patterns/transparent-chunking-large-values.

`GetItems` — Predicate + Selection¶

message GetItemsRequest (
  String              namespace,
  String              id,
  Predicate           predicate,
  Selection           selection,
  Map<String, Struct> signals
)

Predicate: match_all / match_keys / match_range.
Selection: page_size_bytes, item_limit, include / exclude (trim large values from response).
Signals: per-request capability flags (compression, chunking).

Response: (items, next_page_token) with byte-budgeted + adaptive pagination, and SLO-aware early response when the server projects it will miss the request deadline.

`DeleteItems` — same `Predicate` shape as reads¶

message DeleteItemsRequest (
  IdempotencyToken idempotency_token,
  String           namespace,
  String           id,
  Predicate        predicate
)

Record-level (match_all): one tombstone, constant latency regardless of item count.
Range (match_range): one tombstone for the range.
Item-level (match_keys): would normally create many tombstones; KV DAL instead marks metadata expired with a randomised TTL — see concepts/ttl-based-deletion-with-jitter — so compaction work staggers rather than spikes.

`MutateItems` / `ScanItems` — deferred¶

Complex multi-item / multi-record ops. Architecture detail promised in a future post.

Reliability primitives¶

Idempotency-token-protected writes¶

Tokens are client-generated monotonic: generation_time (client clock) + 128-bit UUID. Rationale: "client-generated monotonic tokens are preferred due to their reliability, especially in environments where network delays could impact server-side token generation." Safety relies on measured sub-1 ms clock skew on Netflix's EC2 Nitro fleet. KV rejects writes whose tokens are far in the past (silent-discard prevention) or far in the future (doomstone prevention). Stronger-ordering use cases substitute Zookeeper-issued regional tokens or globally-unique transaction IDs.

This is what makes retries / hedging / parallel speculative requests safe on last-write-wins stores like Cassandra — see concepts/tail-latency-at-scale.

Transparent chunking (> 1 MiB threshold)¶

Items ≤ 1 MiB → main table.
Items > 1 MiB → id/key/metadata in main table; body split into chunks in a separately-partitioned chunk store (which may itself be Cassandra, but with a partitioning scheme tuned for large values).
One idempotency token binds chunk writes + main-table commit into one atomic logical write.

Result: "latency scales linearly with the size of the data." See patterns/transparent-chunking-large-values.

Client-side compression¶

Moved from server to client: saves server CPU + network + disk I/O. One deployment (Netflix Search) reported 75% payload reduction. See concepts/client-side-compression.

Byte-size + adaptive pagination¶

Page limit is bytes not items → predictable SLO ("single- digit millisecond SLO on a 2 MiB page read").
Backing stores (Cassandra, DynamoDB) limit by row count — KV queries a static number of rows, processes, discards excess (big items) or issues more queries (small items).
Adaptive: page-token carries observed item-size; server also keeps a per-namespace rolling estimate for initial-page tuning.

See concepts/byte-size-pagination and concepts/adaptive-pagination.

SLO-aware early response¶

If the server projects retrieving enough data to fill the page will miss the request's end-to-end latency SLO, it stops early and returns what it has + a continuation token. Combined with client- side gRPC deadlines this prevents useless downstream work. See patterns/slo-aware-early-response.

In-band signaling (periodic handshake)¶

Server-side signals (on handshake + periodic refresh): target / max latency SLOs → client dynamically tunes timeouts + hedging.
Client-side signals (per-request): capability flags — does this client support compression, chunking, etc.

Eliminates the static-config + coordinated-redeploy cost of evolving client/server behaviour independently. See concepts/in-band-signaling-handshake.

Named production consumers¶

Streaming metadata — high-throughput, low-latency for real-time personalised delivery.
User profiles — preferences + history across devices.
Messaging push registry — backs Netflix's Pushy WebSocket proxy.
Real-time analytics impressions — integrates with the Bulldozer batch→online pipeline.

Future work (named in the post)¶

Lifecycle management — fine-grained retention / deletion.
Summarisation — collapse records with many items into fewer backing rows.
New storage engines — integration to support more use cases.
Dictionary compression — further payload reduction while preserving performance.

Documented downstream consumer use cases¶

Beyond the platform-launch post, the wiki-documented KVDAL consumer case studies are:

Netflix Graph Abstraction (2026-05-29)¶

Netflix Graph Abstraction — the third Netflix data abstraction (after KV and TimeSeries) and the most demanding KVDAL consumer on the wiki: ~10 M ops/sec across ~650 TB on a property- graph workload. The 2026-05-29 [Part-I post] (sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i) discloses how Graph Abstraction composes KV DAL primitives:
- Each graph dataset breaks into multiple KV namespaces: one per node type, plus per-edge-type forward link, reverse link, and edge property namespaces.
- The KV "map of sorted maps" shape (HashMap<String, SortedMap<Bytes, Bytes>>) maps directly onto adjacency- list storage: source node ID is the partition; target node IDs are the sorted clustering items; the value carries the edge metadata.
- Idempotency tokens with timestamps drive LWW at the storage layer; this is the load-bearing primitive for Graph Abstraction's asynchronous cascade delete and Kafka entropy repair correctness arguments.
- Property selection pushdown + property filtering pushdown at the KV layer reduce the bytes crossing the wire on graph traversals.
- Single-partition node fetches achieve single-digit-ms latency — the core performance property KV DAL provides.
- Wide-row hazards on Cassandra are mitigated by KVDAL's composability: Graph Abstraction separates links from properties across namespaces, so a hub node with millions of edges keeps narrow link partitions. KV DAL becomes the substrate for Netflix's most complex data-abstraction shape while changing zero KV-layer primitives — the "build taller on existing data abstractions" thesis lands here in production form.

Netflix Druid interval-aware cache (2026-04-06)¶

Netflix Druid interval-aware cache — KVDAL two-level-map as the store for per-granularity-aligned- bucket cached query fragments in front of Apache Druid. Exercises two KVDAL features as load-bearing architecture:
- Independent TTLs on each inner key-value pair — used for the age-based exponential TTL ladder (5 s for buckets <2 min old, doubling per minute, capped at 1 hour); eliminates manual eviction logic.
- Efficient range queries over inner keys — "give me all cached buckets between A and B for query hash X"; directly answers the cache's contiguous-prefix lookup. Production: 82% of dashboard queries get ≥partial hit, 84% of result data from cache, P90 ~5.5 ms, ~33% drop in queries to Druid, ~66% P90 improvement (Source: sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid).

Seen in¶

sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer — introducing KV DAL; data model; namespace config; CRUD APIs with idempotency + chunking + pagination + SLO-aware early response + signaling; tombstone + TTL-jitter deletion discipline; named production consumers; future-work list.
sources/2026-04-06-netflix-stop-answering-the-same-question-twice-interval-aware-caching-for-druid — first wiki-documented KVDAL downstream consumer use case; exercises the independent-per-inner-key TTL feature and range-scan over inner keys as the two load-bearing KVDAL capabilities for Netflix's interval-aware Druid cache.
sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i — second canonical KVDAL consumer face. Graph Abstraction composes KV namespaces into a property-graph runtime serving 10 M ops/sec across 650 TB. Adds the disclosure that idempotency-token-driven LWW is reused as the foundational primitive for graph-side strict-EC.

systems/netflix-data-gateway — the parent platform on which KV DAL is the most mature abstraction service.
systems/apache-cassandra · systems/evcache · systems/dynamodb · systems/rocksdb — backing engines KV can route to.
patterns/data-abstraction-layer · patterns/namespace-backed-storage-routing · patterns/transparent-chunking-large-values · patterns/slo-aware-early-response
concepts/two-level-map-kv-model · concepts/database-agnostic-abstraction · concepts/idempotency-token · concepts/byte-size-pagination · concepts/adaptive-pagination · concepts/client-side-compression · concepts/in-band-signaling-handshake · concepts/ttl-based-deletion-with-jitter
companies/netflix