CONCEPT Cited by 1 source
Adjacency list storage¶
Adjacency list storage represents a graph by associating each
source node with the list of its out-edges (or in-edges) — the
direct opposite of an adjacency matrix that would represent
edges as a sparse |V|×|V| boolean grid. For graph databases
running over key-value substrates, adjacency-list storage is
the canonical layout: the source node id becomes the partition
key, and the connected target node ids are sorted clustering
items within the partition.
On a KV substrate¶
If the underlying KV layer offers a HashMap<String,
SortedMap<Bytes, Bytes>> shape (Netflix
KV Abstraction does, and Cassandra's
(partition key, clustering column) model does), the adjacency
list maps onto the substrate without translation:
KV namespace = edge-link namespace (per direction, per edge type)
partition key : source-node-id
clustering key : target-node-id (lex-sorted by default)
value : edge metadata (last_write_time, etc.)
A traversal "give me all edges from source S" becomes a single
range read on partition S — O(neighbours(S)) rather than
the O(|E|) full scan the matrix layout would force.
Direction matters¶
Adjacency-list storage is intrinsically directional — partitioning by source node gives fast forward traversal but no reverse traversal. Two solutions in practice:
- Bidirectional edges — write the link twice, once per direction, so the link namespace contains both forward and reverse pairs (the forward + reverse edge index pattern).
- Reverse-only index — for unidirectional edge types, an auxiliary reverse-direction adjacency list provides reverse traversal at the cost of a second write per edge.
Netflix Graph Abstraction does both: every edge link is written to both forward and reverse adjacency lists at write time (patterns/forward-and-reverse-adjacency-index), so any traversal is a single range read on the appropriate partition.
Wide-row hazard¶
The adjacency list of a high-fanout node ("hub") can grow to millions of entries. On databases like Cassandra this triggers wide-row problems:
- Partition size limits — Cassandra recommends partitions stay below ~100 MB for predictable performance; a hub with millions of edges blows past this.
- Read amplification — a range read on a wide row touches many SSTables; latency degrades with partition size.
Netflix mitigates this by separating edge links from edge properties (patterns/separate-edge-links-from-properties) — the link namespace stays narrow because each link record is small (just the link identity + last-write timestamp), while property bags live in a separate property namespace. Verbatim: "Decoupling edge links from their properties prevents large partitions in databases like Cassandra, enabling efficient storage and low-latency reads — even for edges with millions of connections." (Source: sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i)
Even with the split, the post names a per-source-node edge-count bound — "to ensure optimal performance without exerting too much memory pressure, we aim to limit the number of edges per source node within the system" — as an operational invariant for the latest-sort fetch path (which sorts in memory). The specific cap is not disclosed.
Sort orders inside the adjacency list¶
Two canonical sort orders for the clustering items:
- Lex-sorted by target node id (default in Netflix Graph Abstraction) — supports range queries and pagination without any post-processing.
- Time-sorted by last-write timestamp — supports "latest connections" queries; requires either a separate index, an explicit secondary clustering column, or in-memory sort after fetch.
Netflix uses default lex order on disk + in-memory sort by last-write time for latest-connections traversals, with the edge-count bound above to keep the in-memory sort affordable.
Seen in¶
- sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i — canonical wiki disclosure; verbatim "The Edge links are arranged as an adjacency list mapping source nodes to their connected neighbors." Range-read, forward + reverse, lex- sorted, wide-row prevention via edge-link/property split, and bounded edges-per-source.