PATTERN Cited by 1 source
Primary-standby WAL replication¶
Shape¶
A two-cluster deployment topology where both clusters hold a full copy of the data and are kept in sync via write-ahead-log (WAL) shipping between them:
- Primary cluster — serves online request traffic.
- Standby cluster — receives WAL stream from primary in near-real time; also hosts resource-intensive offline workflows and daily backups that would disturb the primary.
- Cluster-level failover — if the primary cluster fails, ops flips the active designation so the standby becomes the primary. This is coarser than per-node replica failover inside a single cluster; it is a cluster-as-unit operation.
- Intra-cluster replication — each cluster is itself internally replicated (three-way in the canonical Pinterest instance), so the total replica count per logical record is 2 × 3 = 6.
This shape gives strong disaster recovery (whole-cluster or whole- region loss survivable via one flip) at the cost of double the storage footprint of a single-cluster deployment — the canonical concepts/replica-cost-tradeoff.
Why it works¶
- Failure-domain separation. The two clusters are independent blast radii: configuration pushes, bad releases, JVM hangs in one cluster do not propagate to the other. A cluster-wide incident (e.g. a coordinated HBase master failure) still has a clean escape valve.
- Separation of online from offline work. Daily backups, bulk scans, and resource-heavy ops can run on the standby without degrading p99 latency on the primary. This is the same logical shape as read-replicas for OLTP databases, lifted to a whole-cluster granularity.
- WAL stream is the natural replication primitive. HBase already writes every mutation to its WAL for local crash recovery; forwarding that stream across clusters reuses the same durability semantics and orders operations deterministically — see concepts/wal-replication.
Tradeoffs¶
- 6 replicas is expensive. At Pinterest scale (6 PB logical → 36 PB provisioned) the replica multiplier is a load-bearing line item. Alternatives like TiDB, MySQL with careful placement, or RocksDB- based KV stores hit comparable availability SLAs with 3 replicas — one of the five reasons cited in the HBase deprecation (see patterns/nosql-to-newsql-deprecation).
- Cluster-level failover is a human operation. Automatic failover of a whole cluster is structurally riskier than per-node failover because the unit of failure is much larger; most teams keep the flip manual, which raises MTTR.
- Doesn't protect against correlated logical corruption. WAL replication faithfully replays writes — including the bad ones. Point-in-time-recovery from backups remains necessary.
Seen in¶
- sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest — canonical wiki instance. Pinterest's standard HBase production deployment used this shape for ~50 clusters / ~9,000 EC2 instances. "A typical production deployment consists of a primary cluster and a standby cluster, inter-replicated between each other using write- ahead-logs (WALs) for extra availability. Online requests are routed to the primary cluster, while offline workflows and resource-intensive cluster operations (e.g., daily backups) are executed on the standby cluster. Upon failure of the primary cluster, a cluster-level failover is performed to switch the primary and standby clusters." The 6-replica cost of this shape is axis 4 of Pinterest's deprecation framework.
Related¶
- concepts/wal-replication — the underlying replication primitive.
- concepts/primary-standby-failover — the failover mechanic.
- concepts/replica-cost-tradeoff — the economic tradeoff this shape foregrounds.
- patterns/nosql-to-newsql-deprecation — the org-level move that retires this shape for cost reasons.
- systems/hbase — the canonical substrate this shape was built on at Pinterest.