CONCEPT

Schema-diff equivalence class¶

Definition¶

A schema-diff equivalence class is a connected component of the dependency graph formed by a set of schema-diff statements. Two diffs are in the same class iff there is a chain of dependency relationships linking them through entities referenced by either diff.

Canonical verbatim framing:

"If any two diff statements affect entities with a dependency relationship in the schema(s), then schemadiff knows it needs to resolve the ordering of those two diffs. If yet another diff affects entities used by either of these two, then schemadiff needs to resolve the ordering of all three. All the diffs are thus divided into equivalence classes: distinct sets where nothing is shared between any two sets and where the total union of all sets is the total set of diffs."

(Source: )

Properties¶

Let D be the set of diffs in a deploy-request. schemadiff partitions D into classes C_1, C_2, ..., C_k such that:

Disjoint: for all i ≠ j, C_i ∩ C_j = ∅.
Complete: C_1 ∪ C_2 ∪ ... ∪ C_k = D — the union of all classes equals the full diff set.
Dependency-closed: every diff has no dependency relationship with any diff outside its class.

Two consequences that make this partition load-bearing for deploy orchestration:

Cross-class ordering is arbitrary¶

"If you take a sample diff from one equivalence class and then some sample diff from a different equivalence class, you know there's absolutely no dependency between the two. They can be executed in any order."

This means the deploy controller can parallelise across classes without ordering risk — two equivalence classes can be cut over simultaneously, or in any arbitrary sequence, without correctness impact.

Within-class ordering must be topologically sorted¶

"However, any two diffs within the same equivalence class can have a dependency and should be treated as if they do."

Inside a class, the diffs must be executed in an order that respects dependencies — [[patterns/topological-order-by- equivalence-class|topologically sorted]] on the sub-graph induced by the class. schemadiff finds a valid permutation via in-memory schema migration and validation at every step.

Worked example from the post¶

Shlomi's view / table dependency case:

Forward (add column + add to view): - Diff A: ALTER TABLE t ADD COLUMN info VARCHAR(128) NOT NULL DEFAULT '' AFTER id - Diff B: ALTER VIEW v AS SELECT id, info FROM t

B depends on A (B references info, which A adds). A and B are in the same equivalence class. Valid order: A → B.

Reverse (drop column + remove from view): - Diff A': ALTER TABLE t DROP COLUMN info - Diff B': ALTER VIEW v AS SELECT id FROM t

A' depends on B' (running A' first makes v invalid). Same equivalence class. Valid order: B' → A'.

Complication at the deploy-unit altitude: A' is a long-running migration; B' is immediate. The deploy controller's sequencing is: 1. Begin A' in the background (staged catch-up). 2. Wait for A' to be ready to complete. 3. Apply B' (immediate). 4. Cut over A' (complete).

The sequence respects the dependency order B' → A' at the cut-over boundary while overlapping A' with B'-staging for throughput.

Nested views — "The scenarios may be more complex when multiple, nested views are involved, which are based on yet multiple tables being changed in the deployment request." The equivalence-class framework generalises to arbitrary nesting: every entity reachable via dependency chains from any diff is in the class; schemadiff walks the graph to identify the connected component.

Relationship to graph theory¶

The concept is a direct application of connected components from undirected graph theory — two diffs are in the same component iff a path connects them through shared / dependent entities. schemadiff additionally performs topological sort inside each component to find a valid execution permutation.

The formal reduction:

Nodes = schema entities (tables, views, columns, indexes, constraints, FKs).
Edges = "uses" / "references" relationships.
Diff-graph overlay = mark which nodes each diff mutates.
Equivalence class = connected component of the diff-graph overlay.
Within-class ordering = topological sort on the component respecting in-memory validity.

Why the partition matters¶

Without equivalence-class analysis, a deploy controller has two bad options:

Assume everything depends on everything — serialise all N diffs into one order, sacrificing cross-class parallelism.
Assume nothing depends on anything — execute in arbitrary order, risk correctness bugs from unresolved dependencies.

The equivalence-class partition cleanly splits the problem: parallelise across classes, serialise within each class. This is the maximum parallelism a correctness-preserving deploy controller can extract.

Seen in¶

— canonical wiki definition. The four-panel diagram in the post visualises the partition: "(a) Given a set of diffs, (b) Group them into equivalence classes, where changes to elements that have dependencies are grouped together. (c) Ordering of equivalence classes is arbitrary. (d) Within an equivalence class there must be a valid ordering." The framework load-bears the near-atomic deployment story — without equivalence classes, PlanetScale could not safely parallelise across unrelated migrations.