CONCEPT Cited by 1 source
In-memory schema metadata graph¶
An in-memory schema metadata graph is a runtime data structure that materialises a graph database's schema as a small graph in process memory, and uses it as the planner's substrate for every incoming query. Each schema element — node type, edge mapping, property type — becomes a vertex or edge in the metadata graph; the planner walks this metadata graph to enumerate possible traversal paths, validate writes, and prune impossible filters.
The technique is one of several "compile the schema once at startup, exploit it on every request" tricks that show up across graph databases, query engines, and ORMs. Netflix Graph Abstraction's [[sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i|2026-05-29 Part-I disclosure]] is the wiki's canonical instance.
Mechanism¶
- The schema authority (in Netflix's case, the Data Gateway Control Plane) stores graph schemas as JSON: edge mappings + property schemas.
- At server startup, each Graph Abstraction node fetches the schemas for the namespaces it serves and builds an in-memory metadata graph of possible relationships.
- The metadata graph stays resident; the planner consults it on every incoming write, read, and traversal.
- The control plane is polled periodically so user-driven schema changes propagate without restart — schema is hot-reloaded, not deploy-time-frozen.
Four optimisations the metadata graph drives¶
Verbatim from the post:
"The Abstraction servers load this schema on startup and build an in-memory metadata graph of possible relationships, enabling several key optimizations:
- Data Quality: The Abstraction rejects non-conforming nodes, edges, and properties during writes, ensuring high data quality and consistent exports.
- Query Planning: The Abstraction uses the schema to quickly construct the possible traversal paths the service should take to answer a given user query.
- Deduplication of Traversed Edges: For bidirectional traversals on edges between the same node type, the schema helps avoid redundant processing by deduplicating traversed paths.
- Eliminating Traversal paths: For a given user query, the Abstraction removes traversal paths associated with impossible relationships, as well as those where filters or property types are incompatible."
| Optimisation | When it fires |
|---|---|
| Data quality | every write — type mismatch, illegal edge mapping, missing property all rejected up front |
| Query planning | every traversal — enumerate paths schema permits |
| Bidirectional dedup | for self-loops and BIDIRECTIONAL edges between same node type, drop the duplicate side |
| Path elimination | filter-typed prune (e.g. requesting INT filter on a STRING property) ahead of execution |
Hot reload¶
"The Abstraction servers periodically poll the schema from the Data Gateway Control Plane in order to keep it updated with user changes." The implementation must therefore tolerate partial schema rollouts — different servers within a fleet may briefly hold different schema versions. The post does not detail consistency semantics across the rollout window. Practical implications:
- Add operations to a schema (new edge mapping, new optional property) propagate safely.
- Remove operations (drop edge mapping, drop property) need a staged rollout: first stop emitting writes that depend on the removed element, then drop the schema element after all servers have reloaded.
Forward-looking¶
Netflix names two further optimisations the in-memory metadata graph could power:
- Edge-cardinality awareness for fanout-aware path selection — "using edge cardinality within edge mappings, we aim to select the most efficient traversal paths and minimize query fanout."
- Type-safe schema-aware client API — "The schema will support generating a type-safe data access layer and enhance the Gremlin-like API with schema awareness."
Why "in-memory"¶
A schema-on-disk-per-query model would either re-fetch the schema (latency hit) or maintain a hot cache anyway — the in-memory metadata graph is just that cache, with explicit shape (graph) and explicit refresh (poll). The structural choice is to make the planner's substrate the same data structure as the data being planned over: queries and rules operate on graphs.
Comparison¶
| System | Equivalent structure |
|---|---|
| Apache Iceberg | catalog-managed schema; planner consults catalog at scan-plan time |
| GraphQL servers | parsed schema in memory; resolver dispatch walks it |
| RDBMS | system catalog tables; planner reads them every parse, often cached |
| Netflix Graph Abstraction | in-memory metadata graph, hot-reloaded from Control Plane |
Seen in¶
- sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i — canonical wiki disclosure; the metadata graph drives all four of Netflix Graph Abstraction's load-bearing query-time optimisations.