Skip to content

PATTERN Cited by 1 source

Forward and reverse adjacency index

Forward and reverse adjacency index is a graph-storage pattern: every edge link is written to two separate KV partitions — once keyed by the source node (forward index) and once keyed by the target node (reverse index) — so that either-direction traversal becomes a single partition range read. The cost paid is a doubling of edge-link writes; the payoff is O(neighbours) traversal in either direction.

Structure

Edge: (source S, target T) with edge type "linked_to"

Forward link namespace                Reverse link namespace
  partition: source                     partition: target
  clustering: target                    clustering: source

Write of (S, T) emits two records:
  forward[S][T] = {last_write: t}      reverse[T][S] = {last_write: t}

Traversals:
  outbound from S → range read on forward partition S
  inbound  to   T → range read on reverse partition T

Each direction is O(neighbours(node)), not O(|E|).

Why this is necessary

Adjacency-list storage on a KV substrate is intrinsically directional. A single forward-only index means "give me every node that points to T" (a reverse traversal) requires either a full scan or a separate lookup table. For an OLTP graph with sub-millisecond traversal budgets on either direction, the only option is to materialise both at write time.

Why this composes with edge-link/property split

Doubling the edge-link write count is acceptable only because the link record carries a tiny payload (the link identity + a last-write timestamp — no properties). If property bags lived in the link record, doubling edge-link writes would double property storage and force every property update to write two partitions atomically.

patterns/separate-edge-links-from-properties is therefore a prerequisite — it makes the link record small enough that the write doubling is affordable. Edge properties are written once, under a lex-sorted- concat ID (patterns/lex-sorted-concatenated-edge-id).

Disclosed in Netflix Graph Abstraction

Verbatim from Part-I:

"Additionally, edge indexes are separated into forward and reverse indexes to support traversals in either direction. The illustration below shows an example of the reverse index counterpart for the links namespace shown above."

Combined with the edge-link/property split + direction-agnostic edge ID, the storage layout produces:

Direction Storage cost Read cost
Forward traversal S→T 1 link record at forward[S] range read on forward[S]
Reverse traversal T→S 1 link record at reverse[T] range read on reverse[T]
Property fetch (any direction) 1 record at property[concat(sort(S,T))] point read on property partition
Total per edge 3 records (2 link + 1 property) independent of direction

Operational consequences

  • Write amplification — every edge mutation produces 2 link writes (versus 1 in a forward-only system) plus 1 property write. The write-aside link cache suppresses redundant writes when the link already exists, partly offsetting the doubling.
  • Storage overhead — 2× the link-record count vs. forward- only. Small in absolute terms because link records are small.
  • Symmetric latency — forward and reverse traversals have equivalent latency. Useful for queries that need to walk either direction depending on context.

When to use

  • Property graphs where reverse traversals are first-class (e.g., social: "who follows me?").
  • OLTP graph workloads with strict sub-millisecond traversal budgets in both directions.
  • Substrate is wide-column / KV with cheap-per-record write cost.

When not to use

  • One-directional graphs (e.g., DAG-only workloads where reverse traversal is rare).
  • Substrate has cheap reverse-index machinery built in (e.g., a dedicated graph database with native bidirectional adjacency).
  • Storage budget cannot absorb the 2× link-record overhead.

Risks

  • Drift between forward and reverse indexes — non-atomic multi-namespace writes mean a forward-only or reverse-only state is briefly possible. Mitigation: patterns/kafka-entropy-repair-for-multi-namespace-writes.
  • Inconsistent fanout views — if forward and reverse indexes are queried independently and not yet converged, the visible neighbour set differs by direction. Bounded by the entropy-repair pipeline lag.

Seen in

Last updated · 542 distilled / 1,571 read