Skip to content

SYSTEM Cited by 2 sources

Netflix KV Data Abstraction Layer (KV DAL)

Netflix KV DAL is the mature, production-exemplar Data Abstraction Layer built on Netflix's Data Gateway Platform. It is a gRPC service that sits between Netflix microservices and multiple storage engines (Cassandra, EVCache, DynamoDB, RocksDB) and exposes a uniform two-level-map data model over four CRUD APIs. It powers streaming metadata, user profiles, Netflix's push-messaging registry (Pushy), and large-scale impression persistence (Bulldozer). (Source: sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer)

Problem it solves

Netflix ran many microservices against many online key-value and wide-column stores (Cassandra, EVCache, DynamoDB, RocksDB) and kept paying three recurring taxes:

  1. Developers mis-reasoning about consistency/durability/perf across stores deployed globally.
  2. Same data-access pitfalls re-learned per-team — tail latency, idempotency, wide partitions, fat columns, slow pagination.
  3. Tight coupling with evolving native DB APIs — backward- incompatible client-library changes dragged every microservice with them.

KV DAL centralizes the solutions in one service: developers describe a data problem via a namespace; the service routes to the right backend(s), handles idempotency, pagination, compression, large values, and tombstones.

Data model

HashMap<String, SortedMap<Bytes, Bytes>>:

  • Level 1: hashed string id (partition key on Cassandra).
  • Level 2: sorted key → value map of bytes (clustering column on Cassandra).

One structure covers HashMaps (id → {"" → value}), Sets (id → {key → ""}), Records, and Events. See concepts/two-level-map-kv-model.

Canonical Cassandra DDL:

CREATE TABLE IF NOT EXISTS <ns>.<table> (
  id             text,
  key            blob,
  value          blob,
  value_metadata blob,
  PRIMARY KEY (id, key))
WITH CLUSTERING ORDER BY (key <ASC|DESC>)

Items carry explicit chunk and metadata fields:

message Item ( Bytes key, Bytes value, Metadata metadata, Integer chunk )

Namespace: one unit of logical + physical config

A namespace bundles:

  • Where data lives (which clusters / caches / regions).
  • Access-pattern config — consistency_scope, consistency_target, cache TTLs, SLO targets.
  • Which backends serve which tier (primary, cache, chunk).

Example from the post (ngsegment):

persistence_configuration:
  - id: PRIMARY_STORAGE
    physical_storage:
      type: CASSANDRA
      cluster: cassandra_kv_ngsegment
      dataset: ngsegment
      table:   ngsegment
      regions: [us-east-1]
      config:
        consistency_scope:  LOCAL
        consistency_target: READ_YOUR_WRITES
  - id: CACHE
    physical_storage:
      type: CACHE
      cluster: evcache_kv_ngsegment
    config:
      default_cache_ttl: 180s

Durable Cassandra primary + EVCache cache tier in one namespace. Developers pick the namespace; the routing + combination logic lives in KV DAL. See patterns/namespace-backed-storage-routing.

Four CRUD APIs

PutItems — upsert with idempotency

message PutItemRequest (
  IdempotencyToken idempotency_token,
  string           namespace,
  string           id,
  List<Item>       items
)

Idempotency token = (generation_time, 128-bit token). Chunked payloads are staged then committed with metadata (number of chunks) — all tied atomically by one token. See concepts/idempotency-token and patterns/transparent-chunking-large-values.

GetItems — Predicate + Selection

message GetItemsRequest (
  String              namespace,
  String              id,
  Predicate           predicate,
  Selection           selection,
  Map<String, Struct> signals
)
  • Predicate: match_all / match_keys / match_range.
  • Selection: page_size_bytes, item_limit, include / exclude (trim large values from response).
  • Signals: per-request capability flags (compression, chunking).

Response: (items, next_page_token) with byte-budgeted + adaptive pagination, and SLO-aware early response when the server projects it will miss the request deadline.

DeleteItems — same Predicate shape as reads

message DeleteItemsRequest (
  IdempotencyToken idempotency_token,
  String           namespace,
  String           id,
  Predicate        predicate
)
  • Record-level (match_all): one tombstone, constant latency regardless of item count.
  • Range (match_range): one tombstone for the range.
  • Item-level (match_keys): would normally create many tombstones; KV DAL instead marks metadata expired with a randomised TTL — see concepts/ttl-based-deletion-with-jitter — so compaction work staggers rather than spikes.

MutateItems / ScanItems — deferred

Complex multi-item / multi-record ops. Architecture detail promised in a future post.

Reliability primitives

Idempotency-token-protected writes

Tokens are client-generated monotonic: generation_time (client clock) + 128-bit UUID. Rationale: "client-generated monotonic tokens are preferred due to their reliability, especially in environments where network delays could impact server-side token generation." Safety relies on measured sub-1 ms clock skew on Netflix's EC2 Nitro fleet. KV rejects writes whose tokens are far in the past (silent-discard prevention) or far in the future (doomstone prevention). Stronger-ordering use cases substitute Zookeeper-issued regional tokens or globally-unique transaction IDs.

This is what makes retries / hedging / parallel speculative requests safe on last-write-wins stores like Cassandra — see concepts/tail-latency-at-scale.

Transparent chunking (> 1 MiB threshold)

  • Items ≤ 1 MiB → main table.
  • Items > 1 MiB → id/key/metadata in main table; body split into chunks in a separately-partitioned chunk store (which may itself be Cassandra, but with a partitioning scheme tuned for large values).
  • One idempotency token binds chunk writes + main-table commit into one atomic logical write.

Result: "latency scales linearly with the size of the data." See patterns/transparent-chunking-large-values.

Client-side compression

Moved from server to client: saves server CPU + network + disk I/O. One deployment (Netflix Search) reported 75% payload reduction. See concepts/client-side-compression.

Byte-size + adaptive pagination

  • Page limit is bytes not items → predictable SLO ("single- digit millisecond SLO on a 2 MiB page read").
  • Backing stores (Cassandra, DynamoDB) limit by row count — KV queries a static number of rows, processes, discards excess (big items) or issues more queries (small items).
  • Adaptive: page-token carries observed item-size; server also keeps a per-namespace rolling estimate for initial-page tuning.

See concepts/byte-size-pagination and concepts/adaptive-pagination.

SLO-aware early response

If the server projects retrieving enough data to fill the page will miss the request's end-to-end latency SLO, it stops early and returns what it has + a continuation token. Combined with client- side gRPC deadlines this prevents useless downstream work. See patterns/slo-aware-early-response.

In-band signaling (periodic handshake)

  • Server-side signals (on handshake + periodic refresh): target / max latency SLOs → client dynamically tunes timeouts + hedging.
  • Client-side signals (per-request): capability flags — does this client support compression, chunking, etc.

Eliminates the static-config + coordinated-redeploy cost of evolving client/server behaviour independently. See concepts/in-band-signaling-handshake.

Named production consumers

  • Streaming metadata — high-throughput, low-latency for real-time personalised delivery.
  • User profiles — preferences + history across devices.
  • Messaging push registry — backs Netflix's Pushy WebSocket proxy.
  • Real-time analytics impressions — integrates with the Bulldozer batch→online pipeline.

Future work (named in the post)

  • Lifecycle management — fine-grained retention / deletion.
  • Summarisation — collapse records with many items into fewer backing rows.
  • New storage engines — integration to support more use cases.
  • Dictionary compression — further payload reduction while preserving performance.

Documented downstream consumer use cases

Beyond the platform-launch post, the first wiki-documented KVDAL consumer case study is:

Seen in

Last updated · 319 distilled / 1,201 read