CONCEPT Cited by 1 source

Two-level map KV model¶

The two-level map KV model is a key-value data shape of the form

HashMap<String, SortedMap<Bytes, Bytes>>

where the first level is a hashed string id (the partition key) and the second level is a sorted map of byte-keys to byte-values. One structure covers flat maps, named sets, structured records, and time-ordered events — which is why it's attractive as the uniform surface of an abstraction layer over multiple storage engines.

The shape¶

Level 1 — id (string, hashed). Partitioning unit: all items under one id live together on a single replica set.
Level 2 — SortedMap<Bytes, Bytes>. Ordered by key; enables efficient range scans, prefix lookups, and "n newest by key" deletes.

The canonical Netflix statement:

"At its core, the KV abstraction is built around a two-level map architecture. The first level is a hashed string ID (the primary key), and the second level is a sorted map of a key-value pair of bytes." (Source: sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer)

Why this one shape covers many use cases¶

Flat KV map — id → {"" → value} (one item, empty key).
Named Set — id → {key → ""} (only keys matter).
Structured Record — fields encoded as keys, values as the field payloads.
Time-ordered Event log — timestamp-prefixed keys in the sorted map; range scans = time-range queries.
Graphs / adjacency lists — node-id as id, neighbor-ids as keys.

All of these get the same partitioning / replication / pagination / compression / chunking semantics under one abstraction.

Cassandra mapping¶

The two-level map maps directly onto Cassandra's native model:

id → partition key.
key → clustering column (ordered).
value → blob.

Direct DDL from the Netflix post:

CREATE TABLE IF NOT EXISTS <ns>.<table> (
  id             text,
  key            blob,
  value          blob,
  value_metadata blob,
  PRIMARY KEY (id, key))
WITH CLUSTERING ORDER BY (key <ASC|DESC>)

Netflix's KV DAL leverages this same shape on other backends (EVCache, DynamoDB, RocksDB) so namespace configs can swap engines without callers changing their data model.

Trade-offs¶

Wide-partition risk — since everything under one id is on one replica set, adversarial use (unbounded items per id) produces a Cassandra "wide partition" that starves one node. KV DAL addresses this with chunking for large single items, and explicit concern around wide partitions as a known anti-pattern.
Fat-column risk — a single huge value under one key can blow out memory on reads. Same chunking mechanism applies.
Not a relational model — joins, transactions across ids, and secondary indexes are out-of-scope; the KV DAL explicitly trades these for simpler scaling and cache-friendliness.

Seen in¶

sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer — canonical wiki introduction. Named as the data-model substrate covering hundreds of Netflix use cases (streaming metadata, user profiles, messaging push registry, impression analytics).

systems/netflix-kv-dal — canonical DAL built on this data model.
systems/apache-cassandra — partition-key + clustering-column model is the direct mapping.
concepts/database-agnostic-abstraction — two-level map is abstract enough that multiple engines can back it.
concepts/wide-partition-problem — the failure mode of this shape when a single id grows unboundedly.
patterns/data-abstraction-layer