PATTERN Cited by 1 source

Primary-standby WAL replication¶

Shape¶

A two-cluster deployment topology where both clusters hold a full copy of the data and are kept in sync via write-ahead-log (WAL) shipping between them:

Primary cluster — serves online request traffic.
Standby cluster — receives WAL stream from primary in near-real time; also hosts resource-intensive offline workflows and daily backups that would disturb the primary.
Cluster-level failover — if the primary cluster fails, ops flips the active designation so the standby becomes the primary. This is coarser than per-node replica failover inside a single cluster; it is a cluster-as-unit operation.
Intra-cluster replication — each cluster is itself internally replicated (three-way in the canonical Pinterest instance), so the total replica count per logical record is 2 × 3 = 6.

This shape gives strong disaster recovery (whole-cluster or whole- region loss survivable via one flip) at the cost of double the storage footprint of a single-cluster deployment — the canonical concepts/replica-cost-tradeoff.

Why it works¶

Failure-domain separation. The two clusters are independent blast radii: configuration pushes, bad releases, JVM hangs in one cluster do not propagate to the other. A cluster-wide incident (e.g. a coordinated HBase master failure) still has a clean escape valve.
Separation of online from offline work. Daily backups, bulk scans, and resource-heavy ops can run on the standby without degrading p99 latency on the primary. This is the same logical shape as read-replicas for OLTP databases, lifted to a whole-cluster granularity.
WAL stream is the natural replication primitive. HBase already writes every mutation to its WAL for local crash recovery; forwarding that stream across clusters reuses the same durability semantics and orders operations deterministically — see concepts/wal-replication.

Tradeoffs¶

6 replicas is expensive. At Pinterest scale (6 PB logical → 36 PB provisioned) the replica multiplier is a load-bearing line item. Alternatives like TiDB, MySQL with careful placement, or RocksDB- based KV stores hit comparable availability SLAs with 3 replicas — one of the five reasons cited in the HBase deprecation (see patterns/nosql-to-newsql-deprecation).
Cluster-level failover is a human operation. Automatic failover of a whole cluster is structurally riskier than per-node failover because the unit of failure is much larger; most teams keep the flip manual, which raises MTTR.
Doesn't protect against correlated logical corruption. WAL replication faithfully replays writes — including the bad ones. Point-in-time-recovery from backups remains necessary.

Seen in¶

sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest — canonical wiki instance. Pinterest's standard HBase production deployment used this shape for ~50 clusters / ~9,000 EC2 instances. "A typical production deployment consists of a primary cluster and a standby cluster, inter-replicated between each other using write- ahead-logs (WALs) for extra availability. Online requests are routed to the primary cluster, while offline workflows and resource-intensive cluster operations (e.g., daily backups) are executed on the standby cluster. Upon failure of the primary cluster, a cluster-level failover is performed to switch the primary and standby clusters." The 6-replica cost of this shape is axis 4 of Pinterest's deprecation framework.

concepts/wal-replication — the underlying replication primitive.
concepts/primary-standby-failover — the failover mechanic.
concepts/replica-cost-tradeoff — the economic tradeoff this shape foregrounds.
patterns/nosql-to-newsql-deprecation — the org-level move that retires this shape for cost reasons.
systems/hbase — the canonical substrate this shape was built on at Pinterest.

Primary-standby WAL replication¶

Shape¶

Why it works¶

Tradeoffs¶

Seen in¶

Related¶