Skip to content

CONCEPT Cited by 1 source

Traffic-aware migration throttling

Definition

Traffic-aware migration throttling is the operator discipline of scaling down an in-progress schema- migration's resource consumption when production traffic spikes — treating the migration as an elastic background workload that must yield to latency-critical OLTP when the database is busy. The migration pauses (or runs at reduced chunk/copy rate) when production load exceeds a threshold, and resumes when load subsides. The goal is to make migrations invisible to the production workload: no latency spike, no replication-lag blow-up, no operator-visible contention.

Canonical verbatim definition (Burns, 2021 PlanetScale):

"When the deploy request reaches the front of the queue, the deployment to production begins. This process happens in the background and is sensitive to production traffic. If there's a spike in traffic, the schema change migration will scale down to avoid using resources needed to handle the increased traffic."

(Source: sources/2026-04-21-planetscale-non-blocking-schema-changes.)

The workload-priority framing

The key framing is workload priority inversion avoidance. A shadow-table migration is fundamentally I/O- and CPU-hungry: it copies potentially terabytes of data from the original table into a new one, and tails the binlog to keep the copy current. Run unconstrained, it will saturate the primary's disk, inflate replication lag, and degrade customer-facing query latency.

The throttling contract: migration is always lower priority than OLTP. When they compete, OLTP wins. The migration's progress is sacrificed for production latency, not the other way around.

Relationship to Vitess throttler

The 2021-era PlanetScale disclosure is terse — one paragraph. The later wiki coverage of the Vitess throttler (see concepts/database-throttler and the throttler pattern family) fills in the mechanism:

  • Metric-driven — typically replica lag (via heartbeat), load average, or a combination (see patterns/multi-metric-throttling).
  • Per-shard-scoped — each shard has its own throttler, so migration rate responds to local load.
  • Client-identity-aware — migration tooling (gh-ost, VReplication) identifies itself so the throttler can distinguish it from user traffic.
  • Probabilistic rejection — when over threshold, the throttler rejects a fraction of migration chunk-copy calls; fraction grows as lag grows.
  • Fail-open (see concepts/throttler-fail-open-vs-fail-closed) — if the throttler itself is down, migrations continue rather than halt.

The 2021 Burns post describes the policy (scale down under traffic) without describing the mechanism (how scale-down is computed, which metrics drive it, what thresholds are used).

Upstream mechanism disclosure — gh-ost throttle

hooks

gh-ost — the migration engine PlanetScale used in 2021 — exposes the throttle interface directly:

  • --max-lag-millis — pause when replica lag exceeds the threshold.
  • --max-load — pause when a named MySQL status variable (e.g. Threads_running) exceeds the threshold.
  • --critical-load — abort the migration entirely (not just pause) when a named status variable exceeds a critical threshold.
  • --throttle-control-replicas — monitor additional replicas' lag and throttle based on the worst.
  • --throttle-flag-file / --throttle-query — external controls that let operators pause/resume without restarting the migration.

PlanetScale's 2021 architecture layered its own traffic-awareness policy on top of these gh-ost primitives. The later Vitess-native VReplication / Vitess throttler stack subsumes this.

Contrast: naive uniform-rate migration

Without throttling, a migration runs at the rate its chunk-copy loop can issue reads/writes — bounded only by the underlying storage. On a quiet primary this is fine; on a busy primary it steals IOPS from OLTP, inflates replication lag, and in the worst case forces a replica-set reconfiguration. The early-2010s solution (before tools like gh-ost / pt-osc) was to schedule migrations in maintenance windows — an operator-painful, user-impactful compromise. Traffic-aware throttling replaces the maintenance window with a continuous-yielding policy: migrations run any time, but yield every time production needs the resources.

Why migrations can afford to yield

The migration-as-elastic-workload framing relies on a specific property: the migration doesn't have a hard deadline. Deploy requests are expected to take however long they take; a 3-hour migration that pauses for an hour during peak traffic and completes in 4 hours is acceptable. This is the opposite of OLTP queries, which have p99.9 latency SLOs measured in milliseconds. The priority inversion is safe because one side is inelastic and the other is elastic.

Seen in

Last updated · 378 distilled / 1,213 read