CONCEPT Cited by 1 source
Two-level map KV model¶
The two-level map KV model is a key-value data shape of the form
where the first level is a hashed string id (the partition key) and the second level is a sorted map of byte-keys to byte-values. One structure covers flat maps, named sets, structured records, and time-ordered events — which is why it's attractive as the uniform surface of an abstraction layer over multiple storage engines.
The shape¶
- Level 1 —
id(string, hashed). Partitioning unit: all items under one id live together on a single replica set. - Level 2 —
SortedMap<Bytes, Bytes>. Ordered by key; enables efficient range scans, prefix lookups, and "n newest by key" deletes.
The canonical Netflix statement:
"At its core, the KV abstraction is built around a two-level map architecture. The first level is a hashed string ID (the primary key), and the second level is a sorted map of a key-value pair of bytes." (Source: sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer)
Why this one shape covers many use cases¶
- Flat KV map —
id → {"" → value}(one item, empty key). - Named Set —
id → {key → ""}(only keys matter). - Structured Record — fields encoded as keys, values as the field payloads.
- Time-ordered Event log — timestamp-prefixed keys in the sorted map; range scans = time-range queries.
- Graphs / adjacency lists — node-id as
id, neighbor-ids as keys.
All of these get the same partitioning / replication / pagination / compression / chunking semantics under one abstraction.
Cassandra mapping¶
The two-level map maps directly onto Cassandra's native model:
id→ partition key.key→ clustering column (ordered).value→ blob.
Direct DDL from the Netflix post:
CREATE TABLE IF NOT EXISTS <ns>.<table> (
id text,
key blob,
value blob,
value_metadata blob,
PRIMARY KEY (id, key))
WITH CLUSTERING ORDER BY (key <ASC|DESC>)
Netflix's KV DAL leverages this same shape on other backends (EVCache, DynamoDB, RocksDB) so namespace configs can swap engines without callers changing their data model.
Trade-offs¶
- Wide-partition risk — since everything under one
idis on one replica set, adversarial use (unbounded items per id) produces a Cassandra "wide partition" that starves one node. KV DAL addresses this with chunking for large single items, and explicit concern around wide partitions as a known anti-pattern. - Fat-column risk — a single huge value under one key can blow out memory on reads. Same chunking mechanism applies.
- Not a relational model — joins, transactions across ids, and secondary indexes are out-of-scope; the KV DAL explicitly trades these for simpler scaling and cache-friendliness.
Seen in¶
- sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer — canonical wiki introduction. Named as the data-model substrate covering hundreds of Netflix use cases (streaming metadata, user profiles, messaging push registry, impression analytics).
Related¶
- systems/netflix-kv-dal — canonical DAL built on this data model.
- systems/apache-cassandra — partition-key + clustering-column model is the direct mapping.
- concepts/database-agnostic-abstraction — two-level map is abstract enough that multiple engines can back it.
- concepts/wide-partition-problem — the failure mode of this
shape when a single
idgrows unboundedly. - patterns/data-abstraction-layer