Skip to content

CONCEPT Cited by 1 source

WAL replication

WAL replication is the technique of shipping a primary's write- ahead log (WAL) to one or more replicas and having the replicas apply it to produce an identical logical state. It is the replication substrate under HBase primary-standby inter-cluster replication, MySQL binlog replication, PostgreSQL streaming replication, and many others.

Definition

Every mutation on the primary is first appended to the WAL (for local crash recovery — see concepts/wal-write-ahead-logging). The same WAL record is then streamed to a replica, which replays it to apply the mutation. Because the WAL is the authoritative order-of-operations log, replicas that apply it deterministically converge on the same state.

Two common variants:

  • Intra-cluster WAL replication — replicas inside a single cluster share the same WAL stream for high availability (e.g. MySQL binlog replication between primary and read-replicas).
  • Inter-cluster WAL replication — the WAL stream crosses a cluster boundary for disaster recovery or geographic distribution. The canonical Pinterest HBase shape (patterns/primary-standby-wal-replication) uses this — primary cluster and standby cluster each have their own intra-cluster replication but are kept in sync with each other via WAL shipping (Source: sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest).

Why it works

  • Same order on every replica. The WAL is a totally-ordered per-primary log; replaying it in order gives a deterministic result given deterministic operators.
  • Minimal overhead on the primary. The primary already wrote the WAL for its own durability; sending it to the replica is mostly network work.
  • Natural point-in-time-recovery primitive. The WAL can be archived for backup and replayed to reconstruct state at any timestamp.

Tradeoffs

  • Replicas can lag. If the WAL shipping channel is slow or the replica is overloaded, the replica drifts behind the primary — the root of most replication lag incidents.
  • Bad writes are replicated faithfully. WAL replication is a physical replay — including corrupt writes. It does not protect against application-level data corruption.
  • Large transactions can create replay hotspots. A single big transaction on the primary arrives as a single batch at the replica, which has to apply it atomically.

Seen in

Last updated · 550 distilled / 1,221 read