Skip to content

CONCEPT Cited by 3 sources

Binlog replication

Definition

Binlog replication is MySQL's native change-data-capture mechanism: every committed transaction is appended, in order, to the server's binary log (binlog), and any replica — or any external system speaking the MySQL replication protocol — can tail that log to see every committed change in commit order. The replay format can be statement-based (STATEMENT), row-based (ROW), or mixed; row-based is the default and the one that permits general-purpose logical replication across non-identical servers.

The MySQL replication protocol exposes the binlog to clients via two command families:

  • Legacy file/offset-basedCOM_BINLOG_DUMP + binlog filename + offset. Brittle across primary failover and binlog rotation.
  • Modern GTID-basedCOM_BINLOG_DUMP_GTID
  • a GTID set. The server streams every transaction the client is missing from that GTID set forward. Portable across failovers and topology changes.

The retention / throughput dependency

Binlogs are finite on disk. MySQL purges them after a retention threshold (binlog_expire_logs_seconds / expire_logs_days). Any consumer that falls behind the retention horizon cannot catch up — the binlog events it needs are gone — and has to restart from a fresh snapshot.

This forces a coupling between source-side retention and consumer-side throughput: if a long-running copy or a downstream replication can't keep current enough, it has to pause, catch up on what's already been applied, and resume before the binlog horizon laps it. This is exactly why VReplication's copy-phase cycle duration is a tunable knob — the interleaving of copy work and binlog catch-up is what keeps petabyte-scale migrations inside the retention horizon.

Seen in

  • sources/2026-04-21-planetscale-behind-the-scenes-how-schema-reverts-work — the online-DDL application of binlog tailing. Guevara + Noach describe the forward flow (shadow table catches up to production through binlog events filtered through the DDL's column projection) and the inverse flow (post-cut-over, every commit on the new table is tailed from the binlog and applied back to the now-former- production table through the inverse projection, so the old-schema table stays current). Binlog replication is the substrate that keeps the shadow table synchronised in both phases — before cut-over for the forward-direction shadow-table schema change, and after cut-over for the inverse-replication schema revert.

  • sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale — canonical wiki description of the two binlog-replication moments in a VReplication workflow: (1) copy-phase catch-up — the workflow interleaves row-copy cycles with binlog-event catch-up cycles (controlled by the vreplication_copy_phase_duration flag) specifically to stay inside the source's binlog retention horizon; "this regular catchup step is important to ensure that we don't complete the row copy only to then be unable to replicate from where we left off because the source MySQL instance no longer has binary log events that we need as they have been purged, in which case we would be forced to start the entire migration over again." (2) continuous-replication phase — after all rows are copied, each per-shard stream issues COM_BINLOG_DUMP_GTID to the source MySQL with the stream's GTID position, receives binlog events filtered by the destination shard's sharding scheme, applies them on the target PRIMARY tablet, and advances the persisted GTID position on every batch commit.

Last updated · 319 distilled / 1,201 read