CONCEPT Cited by 3 sources
Binlog replication¶
Definition¶
Binlog replication is MySQL's native change-data-capture
mechanism: every committed transaction is appended, in order,
to the server's binary log (binlog), and any replica —
or any external system speaking the MySQL replication
protocol — can tail that log to see every committed change
in commit order. The replay format can be statement-based
(STATEMENT), row-based (ROW), or mixed; row-based is
the default and the one that permits general-purpose logical
replication across non-identical servers.
The MySQL replication protocol exposes the binlog to clients via two command families:
- Legacy file/offset-based —
COM_BINLOG_DUMP+ binlog filename + offset. Brittle across primary failover and binlog rotation. - Modern GTID-based —
COM_BINLOG_DUMP_GTID - a GTID set. The server streams every transaction the client is missing from that GTID set forward. Portable across failovers and topology changes.
The retention / throughput dependency¶
Binlogs are finite on disk. MySQL
purges
them after a retention threshold (binlog_expire_logs_seconds
/ expire_logs_days). Any consumer that falls behind the
retention horizon cannot catch up — the binlog events it
needs are gone — and has to restart from a fresh snapshot.
This forces a coupling between source-side retention and consumer-side throughput: if a long-running copy or a downstream replication can't keep current enough, it has to pause, catch up on what's already been applied, and resume before the binlog horizon laps it. This is exactly why VReplication's copy-phase cycle duration is a tunable knob — the interleaving of copy work and binlog catch-up is what keeps petabyte-scale migrations inside the retention horizon.
Seen in¶
-
sources/2026-04-21-planetscale-behind-the-scenes-how-schema-reverts-work — the online-DDL application of binlog tailing. Guevara + Noach describe the forward flow (shadow table catches up to production through binlog events filtered through the DDL's column projection) and the inverse flow (post-cut-over, every commit on the new table is tailed from the binlog and applied back to the now-former- production table through the inverse projection, so the old-schema table stays current). Binlog replication is the substrate that keeps the shadow table synchronised in both phases — before cut-over for the forward-direction shadow-table schema change, and after cut-over for the inverse-replication schema revert.
-
sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale — canonical wiki description of the two binlog-replication moments in a VReplication workflow: (1) copy-phase catch-up — the workflow interleaves row-copy cycles with binlog-event catch-up cycles (controlled by the
vreplication_copy_phase_durationflag) specifically to stay inside the source's binlog retention horizon; "this regular catchup step is important to ensure that we don't complete the row copy only to then be unable to replicate from where we left off because the source MySQL instance no longer has binary log events that we need as they have been purged, in which case we would be forced to start the entire migration over again." (2) continuous-replication phase — after all rows are copied, each per-shard stream issuesCOM_BINLOG_DUMP_GTIDto the source MySQL with the stream's GTID position, receives binlog events filtered by the destination shard's sharding scheme, applies them on the targetPRIMARYtablet, and advances the persisted GTID position on every batch commit.
Related¶
- concepts/gtid-position
- concepts/consistent-non-locking-snapshot
- concepts/online-database-import
- concepts/online-ddl
- concepts/shadow-table
- concepts/pre-staged-inverse-replication
- systems/mysql
- systems/vitess-vreplication
- patterns/snapshot-plus-catchup-replication
- patterns/shadow-table-online-schema-change
- patterns/instant-schema-revert-via-inverse-replication
- companies/planetscale