Skip to content

SYSTEM Cited by 1 source

gh-ost

Definition

gh-ost (GitHub Online Schema Migration Tool) is a triggerless, binlog-based online schema change tool for MySQL, open-sourced by GitHub in 2016. It executes arbitrary ALTER TABLE statements against a live production table without blocking writes and without causing sustained replication lag, by (a) creating a ghost table that is an empty copy of the original with the new schema applied, (b) backfilling it from a consistent snapshot of the original, (c) tailing the binlog to capture concurrent writes and replay them onto the ghost, and (d) atomically renaming the tables at a brief cut-over. Canonical implementation of the shadow- table online schema change pattern. Repo: github.com/github/gh-ost.

Positioned against pt-online-schema-change (Percona Toolkit): pt-osc uses triggers on the original table to mirror writes onto the ghost; gh-ost replaces triggers with binlog tailing, which decouples the migration load from the primary's write path and makes progress externally observable/pauseable.

Seen in

Mechanism summary

Four-phase shape (see patterns/shadow-table-online-schema-change for the full pattern write-up):

  1. Create ghost tableCREATE TABLE _tbl_gho LIKE tbl, then apply the user's ALTER to _tbl_gho. Ghost is empty.
  2. Backfill — copy rows from the original in ordered chunks under a consistent snapshot.
  3. Apply binlog events — tail the primary's binlog; each concurrent INSERT / UPDATE / DELETE on tbl is replayed onto _tbl_gho. Runs concurrently with step 2.
  4. Cut-over — atomic rename: tbl → _tbl_del, _tbl_gho → tbl. Original table is kept as _tbl_del for quick rollback.

Key distinguishing traits vs pt-online-schema-change:

  • Triggerless. Uses binlog tailing instead of per-row triggers on the original table. Reduces primary-write overhead.
  • Throttle-aware. Exposes throttle hooks on replica-lag, load average, and a control file — migration can be paused/resumed externally without killing the job. This design heavily influenced the later Vitess throttler abstraction.
  • Interruptible / resumable. The migration writes progress state to the ghost table itself; restarts continue from the last chunk.

Where it runs

Relationship to Vitess

Vitess has its own online-DDL implementation (see VReplication-driven schema changes and systems/vitess-schemadiff) which largely supersedes gh-ost for Vitess-native deployments — but gh-ost was the earlier, standalone tool that influenced the design of both. PlanetScale's 2021 architecture composes both: Vitess for orchestration, gh-ost for the migration engine.

Citations

Last updated · 378 distilled / 1,213 read