Skip to content

SYSTEM Cited by 2 sources

Vitess schemadiff

What it is

schemadiff is a Vitess library (Go package in the Vitess source tree) that reads schemas, validates dependency constraints, computes diff DDL between two schemas, and partitions the resulting diffs into equivalence classes with a valid in-order execution permutation inside each class. It is the analytical substrate underneath PlanetScale's near-atomic multi- change schema deployment model.

Introduced on the Vitess blog in April 2023 (vitess.io/blog/2023-04-24-schemadiff), schemadiff is continuously extended — the Vitess 21 release notes call out that more Online DDL analysis ("scenarios beyond the documented limitations," charset conversion, INSTANT eligibility) is now "delegated to the schemadiff library for programmatic power + testability."

Responsibilities

Schema validation

When schemadiff reads a schema, it maps and validates any dependency between entities — for example, verifying that tables and columns referenced by a view actually exist, and that there are no cyclic view definitions (v1 reads from v2, which reads from v1). This produces a schema dependency graph — nodes are schema entities (tables, views, columns, indexes, constraints), edges are reference relationships.

Diff computation

Given two schemas (typically "current production state" and "desired state from the deploy-request branch"), schemadiff emits the sequence of DDL statements (CREATE TABLE, ALTER TABLE, ALTER VIEW, DROP TABLE, etc.) needed to transform the first into the second.

Dependency analysis on the diff

After generating the diffs, schemadiff analyses the dependencies between the diff statements. From the canonical source:

"If any two diff statements affect entities with a dependency relationship in the schema(s), then schemadiff knows it needs to resolve the ordering of those two diffs. If yet another diff affects entities used by either of these two, then schemadiff needs to resolve the ordering of all three."

(Source: sources/2026-04-21-planetscale-deploying-multiple-schema-changes-at-once)

Equivalence-class partitioning

Diffs are partitioned into equivalence classes — connected components of the diff-dependency graph. "All the diffs are thus divided into equivalence classes: distinct sets where nothing is shared between any two sets and where the total union of all sets is the total set of diffs." Cross-class ordering is arbitrary; within- class ordering is determined by topological sort with validity verification.

Permutation search with in-memory validity check

Within each equivalence class, schemadiff searches for a permutation of the diffs that preserves schema validity at every step of the sequence:

"For each equivalence class, schemadiff finds a permutation of the diffs such that if executed in order, the validity of the entire schema is preserved. It's worth reiterating that changes to the underlying database can only be applied sequentially. Thus, we must validate that the schema remains valid throughout the in-order execution. schemadiff achieves this by running in-memory schema migration and validation at every step."

The in-memory validity check means the library does not execute DDL against a real database during planning — it maintains an in-memory representation of the schema and mutates it step-by-step, catching invalid intermediate states (dangling view references, missing foreign-key targets, type-incompatible FK mismatches) before any production DDL is issued.

Canonical use cases

PlanetScale deploy-request orchestration

A PlanetScale deploy-request consists of N schema changes staged on a branch. When the deploy-request is submitted, schemadiff:

  1. Computes the diff DDL from production → branch state.
  2. Partitions the diff into equivalence classes.
  3. For each class, computes a valid execution permutation.
  4. Hands the blueprint to the deploy controller, which runs long-running changes via VReplication staged in catch-up, and serialises immediate changes in the computed order at cut-over time.

This gives the near-atomic multi-change deployment property: N migrations complete "a few seconds apart" in the order schemadiff computed, rather than hours apart in operator- authored order.

Vitess Online DDL analysis

The Vitess 21 release notes list multiple schemadiff extensions inside the single-table Online DDL path:

  • ALGORITHM=INSTANT eligibility analysis beyond MySQL's documented limitations — schemadiff models more scenarios in-memory to determine whether a change can use the cheap metadata-only path.
  • Charset-change handlingschemadiff analyses when programmatic text conversion can replace MySQL's built-in CONVERT(... USING utf8mb4) for performance in primary- key / iteration-key columns (utf8mb4 vs utf8).

Design properties

  • Pure-Go library, no runtime state. schemadiff is an analytical library, not a service — it takes two schemas as strings, returns an analysis tree. No database connections, no persistent state, no coordination with Vitess control-plane components.
  • In-memory simulation of DDL. The validity checker mutates an in-memory schema representation and verifies each intermediate state — no real DDL is issued during planning.
  • Complete dependency coverage. Table ↔ view ↔ column dependencies, foreign-key target verification, index structure validity, and (per Vitess 21) charset / collation / INSTANT eligibility are all modelled.
  • Complementary to the Online DDL executor. schemadiff produces the plan; the Vitess Online DDL executor (via VReplication or pt-online-schema-change / gh-ost strategies) executes the plan. Each layer is independently testable.

Limitations and caveats

  • Resource-bounded. "Resources are not infinite, and only so many changes can run concurrently. Altering a hundred tables in one deployment request is not feasible and possibly not the best utilization of database branching. It is possible to go too far with a branch so that the changes are logically impossible to deploy (or rather, so complex that it is not possible to determine a reliably safe path)." (Source: sources/2026-04-21-planetscale-deploying-multiple-schema-changes-at-once)
  • Permutation search complexity not disclosed. The in-memory-validation-at-every-step algorithm's asymptotic complexity, its behaviour on pathological graphs (cycles the user intended but schemadiff rejects), and its termination guarantees for very large equivalence classes are not documented in the canonical post.
  • "Reliably safe path" is a judgement call. The post acknowledges some deploy-request branches can be mechanically un-deployable by construction — no well-defined rule is given for the boundary; schemadiff returns failure and the operator must decompose the branch into smaller deploy-requests.
  • Relies on MySQL-only DDL semantics. Schema dependency analysis is MySQL-flavoured (view semantics, FK semantics, charset / collation hierarchy). Generalisation to Postgres under PlanetScale Postgres / Neki is not disclosed as of 2026-04-21.

Seen in

  • sources/2026-04-21-planetscale-deploying-multiple-schema-changes-at-once — canonical first wiki disclosure of the library's role in multi-change deployment. Shlomi Noach frames schemadiff as the analytical substrate underneath PlanetScale's near-atomic deployment model: it partitions diffs into equivalence classes ("distinct sets where nothing is shared between any two sets and where the total union of all sets is the total set of diffs"), computes a valid permutation inside each class via in-memory schema migration and validation at every step, and hands the blueprint to the deploy controller. Canonical four-panel diagram of "given a set of diffs → group into equivalence classes → arbitrary ordering across classes → valid ordering within each class." Canonical view-drop vs view-add worked example — ALTER TABLE t DROP COLUMN info
  • ALTER VIEW v AS SELECT id FROM t requires begin-t-wait -immediate-v-complete-t sequencing, not the naive "do v first" intuition.

  • sources/2026-04-21-planetscale-announcing-vitess-21 — extension of schemadiff's responsibilities in the single-table Online DDL path. Vitess 21 release notes list "more INSTANT DDL scenario analysis beyond the documented limitations" and "charset-change handling [that] now uses programmatic text conversion rather than MySQL's CONVERT(... USING utf8mb4) for performance in primary-key / iteration-key columns" both delegated to schemadiff "for programmatic power + testability." The April-2023 library continues to be a load-bearing extensibility point for Vitess Online DDL three years later.

Last updated · 347 distilled / 1,201 read