SYSTEM Cited by 2 sources
Netflix Graph Abstraction¶
Netflix Graph Abstraction is Netflix's internal OLTP-graph service: a strongly-typed property- graph runtime composed on top of the existing Key-Value Abstraction (durable index), EVCache (read cache), and optional TimeSeries Abstraction (historical view), with schemas managed in the Data Gateway Control Plane. As of the 2026-05-29 disclosure: "close to 10 million operations per second across 650 TB of graph datasets with low latency and cost efficiency" — single-digit-ms p99 on edge / node persistence, single-digit-ms 1-hop traversals, 2-hop traversals p90 < 50 ms under high fan-out (Source: sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i).
The earlier 2026-05-29 Service Topology post named Graph Abstraction as substrate but did not decompose it; the same-day Part-I post is the dedicated decomposition. This page is its full canonical home (was an MVP stub on the wiki prior to that ingest).
Position in the corpus¶
Graph Abstraction is the third-named shape in Netflix's "build taller on existing data abstractions" trajectory:
| Abstraction | Wiki page | Workload |
|---|---|---|
| Key-Value | systems/netflix-kv-dal | HashMap<String, SortedMap<Bytes, Bytes>> over Cassandra / EVCache / DynamoDB / RocksDB |
| TimeSeries | systems/netflix-timeseries-abstraction | append-only events with idempotency |
| Distributed Counter | systems/netflix-distributed-counter | best-effort + eventually-consistent counter shapes |
| Graph | this page | strongly-typed property graph; OLTP traversals |
Verbatim positioning: "Instead of building the persistence and caching layers from scratch, we chose to build taller on top of existing Netflix data abstractions."
OLTP-not-OLAP framing¶
Graph Abstraction is specifically built for the OLTP graph workload class, not the OLAP one. The Part-I post makes this disambiguation explicitly:
- OLAP graph — "open-ended and algorithmic exploration of large graph datasets" using RDF/SPARQL, property graphs with Gremlin / openCypher / SQL. Focus: in-depth analysis. Trade- offs: latency and throughput give way to expressiveness.
- OLTP graph — "extremely high throughput — up to millions of operations per second — while delivering traversal results within milliseconds". Trade-offs: "accepting eventual consistency or restricting query complexity. For example, the service can demand a specified starting point for traversals and enforce a maximum traversal depth."
Graph Abstraction is the OLTP shape. "Netflix's Graph Abstraction was designed specifically for this second category of use cases." (concepts/oltp-vs-olap)
Architecture¶
┌────────────────────────────────────┐
│ Data Gateway Control Plane │
│ - graph schema authority │
│ - provisioning / deletion │
│ - capacity-modelled hardware │
└────────────┬───────────────────────┘
│ poll periodically
▼
┌─────────────────────────────────────────────────┐
│ Graph Abstraction servers │
│ - in-memory metadata graph (built from schema) │
│ - schema-aware traversal planner │
│ - gRPC traversal API + Count API │
└────┬────────────────────────────────────┬───────┘
│ point/range reads on KV │
▼ │
┌───────────────────────────────┐ │ Kafka
│ Netflix KV Abstraction │ │ entropy-repair
│ - per-node-type namespace │ │ pipeline
│ - edge-link namespace │ ▼
│ (forward + reverse index) │ ┌────────────────────────┐
│ - edge-property namespace │ │ Kafka durable retry │
│ (lex-sorted-concat ID) │ │ queue for failed multi-│
│ - LWW via idempotency token │ │ namespace writes │
└─────────────┬─────────────────┘ └────────────────────────┘
│
┌─────────┴───────────┐
▼ ▼
┌────────────────┐ ┌────────────────────────────────────┐
│ EVCache │ │ Optional TimeSeries Abstraction │
│ - read-aside │ │ (historical view, Part III) │
│ - inv-on-write │ └────────────────────────────────────┘
│ OR TTL-based │
└────────────────┘
┌────────────────────────────────────────────────────┐
│ In-process write-aside cache (edge links) │
│ - short TTL · inv-on-delete · leases + backoff │
└────────────────────────────────────────────────────┘
Async multi-region replication on KV + EVCache
→ strict eventual consistency on graph layer
Namespace = isolation unit¶
Verbatim: "The Abstraction separates data into isolated units called 'namespaces.' Each namespace is associated with a physical storage layer, as configured in the Data Gateway Control Plane, and can be deployed on either dedicated or shared hardware." The optimal hardware spec is decided by Netflix's provisioning automation based on user-provided requirements (throughput, latency, dataset size, workload criticality) — see the Joey Lynch AWS re:Invent talk.
A graph dataset is therefore not one namespace — it is multiple namespaces (per-node-type + per-edge-link + per-edge-property + per-direction), each with its own KV storage configuration and optional caching. Service Topology, for example, runs the network- flow graph and IPC graph as separate Graph Abstraction namespaces (Source: sources/2026-05-29-netflix-from-silos-to-service-topology-why-netflix-built-a-real-time-service-map).
Strongly-typed property graph + in-memory metadata graph¶
Each namespace is associated with an explicit graph schema in the Data Gateway Control Plane. The schema consists of:
- Edge mappings —
(fromNodeType, edgeType, toNodeType, directionType ∈ {UNIDIRECTIONAL, BIDIRECTIONAL}). - Property schemas per edge mapping —
(propertyKey, propertyValueType).
Verbatim sample:
{
"edgeConfig": {
"edgeMappings": [
{
"edgeMappingKey": {
"fromNodeType": "account",
"edgeType": "owns",
"toNodeType": "profile"
},
"directionType": "UNIDIRECTIONAL"
},
{
"edgeMappingKey": {
"fromNodeType": "profile",
"edgeType": "linked_to",
"toNodeType": "device"
},
"directionType": "BIDIRECTIONAL"
}
]
}
}
{
"edgeMappingKey": {
"fromNodeType": "profile",
"edgeType": "linked_to",
"toNodeType": "device"
},
"propertySchema": {
"propertyMappings": [
{ "propertyKey": "registration_time", "propertyValueType": "TIMESTAMP" },
{ "propertyKey": "status", "propertyValueType": "STRING" }
]
}
}
At server startup the schema is loaded and "build[s] an in-memory metadata graph of possible relationships, enabling several key optimizations":
| Optimisation | Use |
|---|---|
| Data Quality | reject non-conforming nodes / edges / properties at write time |
| Query Planning | compute possible traversal paths for a given user query |
| Bidirectional Dedup | for traversals on edges between same node type, avoid double-walking via the schema |
| Path Elimination | drop traversal paths whose relationships are impossible or whose property filters / types are incompatible |
Schema is hot-reloaded — "the Abstraction servers periodically poll the schema from the Data Gateway Control Plane in order to keep it updated with user changes". Roadmap: edge-cardinality awareness for fanout-aware path selection, schema-aware Gremlin-like type-safe client API. (concepts/in-memory-schema-metadata-graph, patterns/schema-aware-traversal-planning)
Storage layout (the load-bearing disclosure)¶
KV substrate primitives reused¶
KV organises data within a namespace as a "map of sorted maps":
- Data partitioning — "a namespace is associated with a table in the underlying storage layer. Within the table, data is partitioned into records by unique IDs, with each record holding multiple sorted items as key-value pairs."
- Idempotency — "writes to a given ID and key are idempotent, enabling request hedging and safe retries. The idempotency token contains a timestamp, which KV uses to enforce Last-Write-Wins (LWW) semantics at the storage layer."
(See concepts/idempotency-token and concepts/last-write-wins for the canonical wiki framings.)
Node storage — per-node-type KV namespace¶
Each node type lives in its own KV namespace, holding all properties for nodes of that type. Access patterns enabled:
| Pattern | Effect |
|---|---|
| Efficient reads | one node + all properties in single partition lookup, single-digit-ms |
| Property selection pushdown | target keys pushed to KV layer; less data on wire |
| Property filtering pushdown | filter at KV layer before returning |
| Efficient exports | parallelised by node type |
Edge storage — links and properties separated¶
Edges use two distinct types of indexes:
- Link index — adjacency list mapping source node → connected neighbours.
- Property index — per-edge property bag.
Verbatim disclosed benefits:
- Efficient property upserts — "Allows individual properties to be upserted over time without needing to read the entire property set for an edge."
- Wide row prevention — "Decoupling edge links from their properties prevents large partitions in databases like Cassandra, enabling efficient storage and low-latency reads — even for edges with millions of connections."
The cost: non-atomic writes across link + property namespaces — addressed via Kafka entropy repair (see Consistency Enforcement). (patterns/separate-edge-links-from-properties)
Forward + reverse indexes on the link namespace¶
Edge links live in both a forward and a reverse index so
traversals can fan out in either direction with a single range
read on a partition. The post shows a forward and reverse pair
on the linked_to edges between profile and device.
(patterns/forward-and-reverse-adjacency-index,
concepts/forward-reverse-edge-index)
Direction-agnostic edge property storage¶
Verbatim: "To ensure consistent record identifiers when updating edge properties in either direction, the Abstraction lexicographically sorts and concatenates the source and destination node IDs to create a direction-agnostic identifier for property storage. This ensures that properties can be accessed or mutated in a single database call regardless of the direction specified in the request."
The structural reasoning: edge links are inherently directional
(forward index ≠ reverse index), but edge properties are
direction-symmetric — a property of the (A, B) edge is the same
property whether you traversed A→B or B→A. Lex-sorting the
endpoints before concatenation gives a unique identifier per
edge-pair, indexed once in the property namespace.
(concepts/direction-agnostic-edge-id,
patterns/lex-sorted-concatenated-edge-id)
Edge access patterns enabled¶
| Pattern | Mechanism |
|---|---|
| Point reads | given edge id → all properties in single partition lookup on the property index |
| Range reads | given source node → range read on the link partition; forward / reverse selected by direction |
| Property filtering | properties only fetched for links matching the record / page-limit criteria |
| Sort orders | default lex-sorted by target node; latest-connections requires fetch-then-sort-by-last-write-time in memory; bounded by per-source-node edge limit to control memory pressure |
Caching strategies¶
Two motivations from the post:
- Write amplification — "A single write on the fronting service can result in multiple writes to the backing durable storage due to the use of multiple indexes."
- Read amplification — "A single traversal request on the fronting service may translate into thousands of fetch operations on the backend, especially for highly interconnected graphs."
(concepts/write-amplification, concepts/read-amplification)
Write-aside cache of edge links¶
Edge link records carry no payload beyond the link itself + last- write timestamp, so caching them lets the server skip a durable-store write of an edge link that already exists.
Verbatim: "This mechanism is balanced with configurable TTL windows, cache invalidation on deletes, and lease acquisitions with exponential backoff. These strategies provide the necessary consistency guarantees while still allowing the last-write timestamp to be refreshed according to the predefined staleness."
Three structural pieces:
- Configurable TTL per namespace — bounds staleness of the "this link exists" assertion.
- Invalidate on delete — explicit cache eviction when a link is removed.
- Lease + exponential backoff — when contending writers see
a recent write of the same edge link, only one of them races
to update the durable store; others back off, the leaseholder
refreshes the cached
last_write_time, and contention drops to a single durable write.
(patterns/write-aside-cache-for-edge-links)
Read-aside cache of properties on EVCache¶
Verbatim: "To reduce read amplification on the durable store, the Graph Abstraction leverages KV's integration with EVCache. Multiple KV namespaces can share the same caching clusters for cost efficiency. The Abstraction first fetches data from durable storage, while subsequent reads are served from the cache. Caching is applied at both the record and item levels, benefiting all graph objects."
Two per-namespace selectable invalidation strategies:
| Strategy | When |
|---|---|
| Invalidate on write | record + item caches invalidated on every write — strong cross-region consistency, higher cache RPS |
| TTL-driven | wait for TTL — best for high-write-throughput graphs that tolerate staleness |
(patterns/read-aside-cache-with-dual-invalidation)
Work-in-progress: write-through cache¶
Verbatim: "We are also developing a write-through caching strategy designed to store most of the data required by the Abstraction during traversals. This caching mechanism can organize indexes by different sort orders (e.g., sorting data by last-write timestamp), at the cost of increased memory consumption."
Not yet shipped at time of disclosure. Implication: traversal- shaped cache layouts — keyed by the traversal access pattern, not the storage access pattern, so traversals don't have to re-shape the read.
Consistency enforcement¶
The graph layer guarantees strict eventual consistency across regions (concepts/strict-eventual-consistency), enforced by three named mechanisms:
1. LWW via timestamped idempotency tokens¶
Inherited from the KV substrate: every write carries a timestamped idempotency token; the storage layer applies LWW. Hedged or retried writes are idempotent under this rule. (concepts/idempotency-token, concepts/last-write-wins)
2. Kafka entropy repair for multi-namespace writes¶
Verbatim: "Each write in the Abstraction persists data for both inward and outward indices in parallel to support high throughput. Further, each write happens on multiple KV namespaces. To prevent inconsistencies or lasting entropy from failures in any operation, the Abstraction uses a robust retry mechanism using Kafka."
Mechanism shape (verbatim disclosure is partial; the post has a diagram but only this paragraph of text): the failed-write intent is published to Kafka; consumers retry the multi-namespace write until it converges. The graph operates without distributed transactions across namespaces — Kafka becomes the durable retry queue that guarantees eventual cross-namespace consistency.
This is the first canonical wiki disclosure of Kafka as an entropy-repair substrate for a graph-storage system, distinct from its uses as an event log, a stream-processing substrate, or a CDC carrier. (concepts/entropy-repair, patterns/kafka-entropy-repair-for-multi-namespace-writes)
3. Asynchronous cascade delete with LWW¶
Verbatim: "Deleting nodes in a highly connected graph is more complex than simply removing a KV record as each node may have thousands of connected edges that must be handled to maintain graph integrity. Further, synchronously deleting all such connections would introduce unacceptable latency for the Abstraction callers. The Abstraction employs an asynchronous deletion strategy to manage this issue. The consequence of this approach, however, is that the observed mutated state is only eventually consistent. Further, to ensure correctness of asynchronous deletes during concurrent updates, the Last-Write- Wins (LWW) conflict resolution mechanism is essential."
Disclosed observed latency: "Asynchronous operations such as node deletions can be slightly latent, but typically perform with sub-second latency."
The structural argument (verbatim-derived): an asynchronous cascade delete that races a concurrent update could otherwise resurrect deleted state if naive write ordering picked the non-delete write. LWW guarantees the delete carries a higher timestamp than any update that races but did not actually happen-before the delete, making the eventual state correct. (concepts/asynchronous-cascade-delete, patterns/asynchronous-cascade-delete-for-high-fanout-graph-nodes)
Multi-region semantics¶
Verbatim: "As illustrated in the diagram below, both the caching layer and durable storage replicate data asynchronously across regions, resulting in an eventually consistent system."
Graph Abstraction does not add cross-region semantics on top — it inherits async replication from the KV Abstraction + EVCache underlying it. The payoff: high availability across regions; the cost: cross-region reads may briefly see stale state. The strict-EC framing covers correctness under partition + recovery.
Traversal API¶
Custom gRPC API "inspired by Gremlin". Capabilities: chain traversals, filter, sort, limit results, select properties (pushdown to KV), filter by direction.
Worked example from the post — "recommend shows to users on a shared device, by considering the duration of the most recent viewing session for each show across all profiles and accounts associated with that device":
TraversalRequest.newBuilder()
.setNamespace("<graph-namespace>")
.setTraversalQuery(
TraversalQuery.newBuilder()
.setStartNode(node("device", "my-device-id"))
.setTraversal(
Traversal.newBuilder()
.setEdgeLimit(5)
.setDirectionTraversal(
DirectionTraversal.newBuilder()
.setDirection(IN)
.addNodePropertiesSelections(propSelection("account", "created_at"))
.addNodePropertiesSelections(propSelection("profile", "last_active"))
.setDirectionFilter(
DirectionFilter.newBuilder()
.setTypeMatchingStrategy(EXCLUDE_NON_TARGETED)
.addAllNodeFilters(typeFilters("account", "profile"))))
.addNextTraversals(
Traversal.newBuilder()
.setOrder(LATEST)
.setEdgeLimit(200)
.setDirectionTraversal(
DirectionTraversal.newBuilder()
.setDirection(OUT)
.addEdgePropertiesSelections(propSelection("watched", "view_time"))
.addEdgePropertiesSelections(propSelection("has_plan", "active"))
.setDirectionFilter(
DirectionFilter.newBuilder()
.setTypeMatchingStrategy(EXCLUDE_NON_TARGETED)
.addAllNodeFilters(typeFilters("title", "plan")))))))
.build();
The shape: device → IN(account|profile, edge limit 5) →
OUT(title|plan, edge limit 200, latest order). Property
selections are pushed down — only created_at, last_active,
watched.view_time, and has_plan.active cross the wire.
Traversal-engine deep-dive deferred to Part II of the series.
Count API¶
Companion API: "a Count API that performs counting traversals at a very high rate with similar latencies". Detailed coverage deferred to Part II.
Gremlin influence is API-shape only¶
The Gremlin influence is at the API surface layer (traversal-chaining ergonomics). The underlying engine is Netflix's own — schema-aware planner, mandatory traversal start node, bounded depth — diverging from canonical Gremlin / TinkerPop runtime semantics.
Operational numbers¶
| Metric | Value |
|---|---|
| Aggregate throughput at peak | ~10 M ops/sec |
| Total stored | ~650 TB globally |
| Edge / node persistence | single-digit-ms p99 |
| 1-hop traversal | single-digit-ms p99 |
| 2-hop traversal | upwards of 100 ms; p90 < 50 ms with high fan-out |
| Async node deletion | typically sub-second |
| Count API | similar to traversal latencies |
The post tracks average and max edge fan-out at different depths as a first-class operational metric. (concepts/graph-traversal-fanout)
Use cases¶
Three named first-party Netflix consumers:
- Real-Time Distributed Graph (RDG) — "a graph capturing dynamic relationships across entities and interactions throughout the Netflix ecosystem." Successor to the original RDG implementation; "This functionality has since been integrated into the Graph Abstraction." Verbatim disclosure of the RDG workload shape: "powered by 2-hop traversals with a higher degree of fan-out" — drives the p90<50ms 2-hop number.
- Social Graph — "A graph of social connections within Netflix Gaming, designed to boost user engagement."
- Service Topology — the sibling 2026-05-29 post's real-time service-dependency graph; the network-flow graph and IPC graph each ride a Graph Abstraction namespace; the trace graph rides a separate columnar substrate outside Graph Abstraction.
Forward-looking framing: "As Netflix scales further into new verticals such as live content, games, and ads, Graph Abstraction will remain crucial for uncovering and leveraging rich connections."
Trade-offs and known limits¶
| Trade-off | Resolution |
|---|---|
| Traversal must specify start node | by-design OLTP-graph constraint |
| Traversal has a depth bound | by-design OLTP-graph constraint |
| Cross-namespace writes are non-atomic | Kafka entropy repair makes them eventually-consistent |
| Cascade delete latency unbounded by sync wait | asynchronous pipeline; LWW preserves correctness vs concurrent updates |
| Cross-region replication is async | strict eventual consistency framing; LWW resolves cross-region conflicts |
| Per-source-node edge count must be bounded | named operational invariant for memory pressure on latest-sort traversals; specific cap not disclosed |
| Write-through cache not yet shipped | WIP at time of disclosure |
| OLAP graph workloads | out of scope; substrate not addressed in this post |
What's coming (per the post)¶
- Part II — design and implementation of traversal planning and execution; different traversal types; Count API.
- Part III — temporal index implementation and integration with TimeSeries Abstraction for historical / time-travel views.
Seen in¶
- sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i — canonical decomposition; first-of-series; this page is built from this source.
- sources/2026-05-29-netflix-from-silos-to-service-topology-why-netflix-built-a-real-time-service-map — substrate reference; first-named consumer (Service Topology network and IPC graphs each in their own namespace).
Related¶
- systems/netflix-kv-dal — durable substrate per namespace
- systems/evcache — read-aside cache substrate
- systems/netflix-timeseries-abstraction — optional historical- view substrate
- systems/netflix-data-gateway — schema authority + provisioner
- systems/apache-cassandra — example backend whose wide-row hazard the link/property split protects against
- systems/netflix-service-topology — first-named consumer
- concepts/property-graph · concepts/oltp-vs-olap · concepts/adjacency-list-storage · concepts/forward-reverse-edge-index · concepts/direction-agnostic-edge-id · concepts/in-memory-schema-metadata-graph · concepts/read-amplification · concepts/write-amplification · concepts/idempotency-token · concepts/last-write-wins · concepts/strict-eventual-consistency · concepts/asynchronous-cascade-delete · concepts/graph-traversal-fanout · concepts/entropy-repair
- patterns/separate-edge-links-from-properties · patterns/forward-and-reverse-adjacency-index · patterns/lex-sorted-concatenated-edge-id · patterns/write-aside-cache-for-edge-links · patterns/read-aside-cache-with-dual-invalidation · patterns/kafka-entropy-repair-for-multi-namespace-writes · patterns/asynchronous-cascade-delete-for-high-fanout-graph-nodes · patterns/schema-aware-traversal-planning
- companies/netflix