Skip to content

CONCEPT Cited by 1 source

Schema dependency graph

Definition

A schema dependency graph is the directed graph whose nodes are schema entities (tables, views, columns, indexes, foreign-key constraints, check constraints, triggers) and whose edges encode the "X references Y" relationships between them. A view referencing columns of a table points to those columns and indirectly to the table. A foreign-key constraint points to the parent table's primary key. An index points to its columns.

The graph is the substrate schemadiff walks to validate schema correctness (no dangling references, no cyclic views) and to partition diff statements into equivalence classes for the near-atomic multi-change deployment model.

Validation obligations

schemadiff uses the graph to enforce structural correctness before any DDL executes:

  • Reference completeness: every column / table / index referenced by a view or FK must exist in the target schema.
  • Acyclic views: no view v1 may read from v2 if v2 reads from v1 (directly or transitively).
  • Type compatibility: FK columns must have types compatible with their parent-table primary-key columns.
  • Intermediate validity: during a multi-step DDL sequence, every intermediate schema state must also be valid (a rename step cannot leave the graph dangling-referenced at any point).

Canonical verbatim: "When schemadiff reads a schema, it maps and validates any dependency between entities. For example, it can validate that table and columns referenced by some view exist or that there are no cyclic view definitions (v1 reads from v2, which reads from v1)." (Source: sources/2026-04-21-planetscale-deploying-multiple-schema-changes-at-once)

Dependency edges in the canonical Vitess domain

schemadiff handles (at minimum) these dependency types:

  • Column → Table: every column belongs to exactly one table.
  • Index → Column(s): an index references one or more columns of its table.
  • View → Columns (of underlying table(s)): views are query-defined; every column or expression in the view's SELECT resolves to columns of base tables.
  • View → View: nested views reference other views.
  • Foreign key → Parent table: an FK constraint references a parent table's primary key.
  • Check constraint → Columns: a check constraint references columns of its table.
  • Charset / collation hierarchy: column charset inherits from table, which inherits from database, which inherits from server — see character set and collation for the four-level inheritance hierarchy.

Diff → sub-graph → equivalence class

A schema diff (e.g. ALTER TABLE t DROP COLUMN info) mutates specific nodes in the graph. schemadiff overlays the diffs onto the graph and identifies which diffs touch entities with dependency relationships. Two diffs whose entities share a dependency chain end up in the same equivalence class; cross-class diffs are independent.

Worked example from the post:

  • ALTER TABLE t ADD COLUMN info VARCHAR(128) — mutates the t table node, adding an info column sub-node.
  • ALTER VIEW v AS SELECT id, info FROM t — mutates the v view node; new definition references both id and info on t.

Edges: v → info (new); v → id (already existed). The view diff depends on the table diff. Both diffs are in the same equivalence class; valid order is table-first, view- second.

Generalisation to nested views

"The scenarios may be more complex when multiple, nested views are involved, which are based on yet multiple tables being changed in the deployment request." The graph framework naturally generalises:

  • Every entity reachable via dependency chains from a diff's target node is in the same equivalence class.
  • schemadiff computes transitive reachability to determine class membership.
  • Within each class, topological sort respecting the graph edges gives a valid execution permutation.

The algorithm does not fail on nested views per se — it fails on graphs where no valid permutation exists (cyclic constraints, mutually-referencing FK adds) or on deploy- request branches whose complexity exceeds a "reliably safe path" threshold.

Relationship to general DAG / DBMS theory

The schema dependency graph is a specialisation of the general directed acyclic graph (DAG) widely used in database-internals contexts:

  • Query-planner join graphs — different edges (JOIN predicates), same algorithms.
  • Module / package dependency graphs — software build systems (Bazel, cargo) solve the same topological- sort-under-validity-constraint problem at a different altitude.
  • Foreign-key constraint graphs — the DBMS itself uses these for cascade-delete ordering, integrity checks, and referential-integrity enforcement during DML.

What makes the schema dependency graph distinct in the schemadiff context is the diff-overlay — classical DAGs assume node set is static; schemadiff's problem is "how does this set of mutations partition the graph into independent sub- problems?"

Seen in

  • sources/2026-04-21-planetscale-deploying-multiple-schema-changes-at-once — canonical wiki framing. Shlomi Noach introduces the graph implicitly through the dependency-validation language ("maps and validates any dependency between entities," "cyclic view definitions") and explicitly through the diff-to-equivalence-class partition algorithm. The graph is the substrate that makes near-atomic deployment mechanically possible — without it, PlanetScale could not safely compute the ordering of multi-table migrations without hand-authored dependency metadata.
Last updated · 347 distilled / 1,201 read