Skip to content

PATTERN Cited by 1 source

Schema-aware traversal planning

Schema-aware traversal planning is a query-planning pattern for graph databases: the runtime maintains the graph schema as an in-memory metadata graph (built once at server startup, hot-reloaded from a control plane) and uses it to enumerate, validate, and prune traversal paths before any storage I/O.

The pattern unlocks four query-time optimisations from a single shared substrate:

  1. Data-quality enforcement at write time.
  2. Possible-path enumeration at query time.
  3. Bidirectional-traversal deduplication.
  4. Impossible-path elimination.

Disclosure

Verbatim from Netflix Graph Abstraction Part I:

"The Abstraction servers load this schema on startup and build an in-memory metadata graph of possible relationships, enabling several key optimizations:

Data Quality: The Abstraction rejects non-conforming nodes, edges, and properties during writes, ensuring high data quality and consistent exports.

Query Planning: The Abstraction uses the schema to quickly construct the possible traversal paths the service should take to answer a given user query.

Deduplication of Traversed Edges: For bidirectional traversals on edges between the same node type, the schema helps avoid redundant processing by deduplicating traversed paths.

Eliminating Traversal paths: For a given user query, the Abstraction removes traversal paths associated with impossible relationships, as well as those where filters or property types are incompatible."

Structure

┌────────────────────────────────────────────────────────────┐
│ Control Plane authority                                    │
│  - graph schema as JSON                                    │
│    edge mappings: (fromNodeType, edgeType, toNodeType,     │
│                    directionType)                          │
│    property schemas: (propertyKey, propertyValueType)      │
└──────────────────────────────┬─────────────────────────────┘
                               │ poll periodically
┌────────────────────────────────────────────────────────────┐
│ Graph Abstraction server                                   │
│  In-memory metadata graph (built from schema at startup)   │
│  ┌────────────┐  edge-mapping  ┌────────────┐              │
│  │ NodeType A │ ───────────────│ NodeType B │              │
│  └────────────┘                └────────────┘              │
│       │                              │                     │
│       └──── property schemas ────────┘                     │
│             with allowed property types                     │
└──────────────────────────────┬─────────────────────────────┘
┌────────────────────────────────────────────────────────────┐
│ Traversal planner uses metadata graph for:                 │
│  • write-time validation (reject non-conforming)           │
│  • path enumeration (only schema-permitted paths)          │
│  • bidirectional dedup (same-node-type self-edges)         │
│  • path elimination (impossible relationships, type-       │
│    incompatible filters)                                   │
└────────────────────────────────────────────────────────────┘

Why "in-memory" + "graph"

A schema-on-disk-per-query model would either re-fetch the schema (latency hit) or maintain a hot cache (which is exactly what the in-memory metadata graph is, made explicit). Storing the schema as a graph itself is the right move because the planner walks it the same way it walks the data — same data structure, same primitives.

Hot reload

"The Abstraction servers periodically poll the schema from the Data Gateway Control Plane in order to keep it updated with user changes." The pattern requires:

  • Idempotent schema reload — applying the same schema multiple times converges to the same state.
  • Tolerance for partial-schema-version skew across servers during rollout — different servers in the fleet briefly hold different schema versions.
  • Staged rollout for destructive schema changes (drop edge type, drop property) — first stop emitting writes that depend on the removed element, then drop after all servers have reloaded.

The post does not detail consistency semantics across the rollout window.

Forward-looking optimisations from the same substrate

Two named in the post but not yet shipped:

  • Edge-cardinality-aware fanout planning"using edge cardinality within edge mappings, we aim to select the most efficient traversal paths and minimize query fanout."
  • Type-safe schema-aware client API"The schema will support generating a type-safe data access layer and enhance the Gremlin-like API with schema awareness."

Both ride the same in-memory metadata graph.

Trade-offs

Trade-off Mechanism
Schema must exist Cannot serve schemaless graphs; OLAP-graph workloads use different substrate
Schema reload latency vs. freshness Polling interval is the staleness floor
Memory footprint per server Bounded by schema size, typically negligible vs. data
Planner complexity Higher than schemaless dispatch; offset by orders-of-magnitude gain on the four optimisations
Rolling schema changes Need staged rollout discipline for destructive ops

Composition

The pattern composes with:

Seen in

Last updated · 542 distilled / 1,571 read