Skip to content

PATTERN Cited by 1 source

Cross-DC semi-sync for durability

Shape of the pattern

Place semi-sync replicas in a different datacenter / availability zone from the primary, so that an ack from a semi-sync replica implies the transaction is persisted outside the primary's failure domain. A commit acknowledged to the user is therefore durable even against a total loss of the primary's DC.

Shlomi Noach's canonical framing:

"If we only run semi-sync replicas in a different datacenter than the primary's, we first pay with increased write latency, as each commit runs a roundtrip outside the primary's datacenter and back. With multiple semi-sync replicas it's the time it takes for the fastest replica to respond. When the primary goes down, we have the data durable outside its datacenter." (Source: sources/2026-04-21-planetscale-mysql-semi-sync-replication-durability-consistency-and-split-brains)

The pattern is the natural answer to the weakness Noach identifies when semi-sync replicas live in the same DC as the primary: a same-DC semi-sync ack confirms durability to a replica that the primary's DC outage can take out simultaneously. Cross-DC placement closes that window.

Mechanism

The pattern is topological, not configurational — rpl_semi_sync_master_wait_for_slave_count=k is untouched. What changes is the physical placement of the replicas that carry the semi-sync plugin:

  1. Tag the semi-sync replica role by DC. Only replicas in DCs other than the primary's participate.
  2. Size the cross-DC replica count to match wait_for_slave_count + a survival margin. For k=1, at least 2 cross-DC replicas is typical (so a single replica failure doesn't trigger timeout fallback).
  3. Deploy local non-semi-sync replicas in the primary's DC for fast read traffic and failover-seeding (they just don't ack).

When the primary commits, it waits for acks from the fastest cross-DC replica. Commit latency = max(local write, fastest cross-DC RTT + replica disk flush).

Trade-offs

Axis Same-DC semi-sync Cross-DC semi-sync
Write latency ~1ms (intra-DC RTT + disk) 5–50ms (inter-DC RTT + disk)
Durability vs DC outage ✗ same DC fails together ✓ data persisted outside failure domain
Failover complexity Low — promote local replica Higher — cross-DC promotion has application impact
Operational cost Lower (no cross-DC bandwidth) Higher — semi-sync traffic is committed data + metadata

The latency tax is the price paid. Noach observes: "With multiple semi-sync replicas it's the time it takes for the fastest replica to respond" — mitigated by deploying multiple cross-DC replicas so the tail isn't worst-of-all.

The failover procedure

On primary-DC outage:

  1. Verify the primary is truly gone. Cross-DC semi-sync replicas' relay logs contain all acknowledged writes — but so do non-semi-sync replicas in the primary DC, which may be more up-to-date. Noach notes: "we can then also compare with non semi-sync replicas in the primary's datacenter: they may yet have all the transactions, too."
  2. Decide: promote in primary's DC or remote. Promoting in the primary's DC (seeded from a cross-DC replica if needed) is usually less disruptive to applications; promoting remotely means the replication cluster's primary-DC is effectively reassigned.
  3. Reassign the semi-sync replica set if promoting remotely — the new primary's DC must not contain semi-sync replicas, so some reconfiguration is required. "We then must reassign semi-sync replicas, and ensure none run from within the new primary's datacenter."

What this pattern does NOT buy

Consistency under partition. If the primary's DC is network-isolated rather than down — still alive, just unreachable — the pattern does not prevent the primary from continuing to commit to any still-reachable cross-DC semi-sync replica. The pattern buys durability; it does not buy the consistency of a consensus protocol. See concepts/durability-vs-consistency-guarantee.

Split-brain immunity. The 1-n split-brain topology still applies: the primary plus one cross-DC replica forms a minority quorum that can commit in isolation. See concepts/minority-quorum-writeability. Operational mitigations (fencing the old primary, anti-flapping on reparenting) remain necessary.

When to use

  • Regulatory / business requirement that acknowledged writes cannot be lost to a single DC.
  • Architecture tolerates cross-DC write latency on the hot path — typically OLTP systems where p50 commits are fine but regulatory durability is mandatory.
  • Multi-DC deployment is already in place for read scalability or DR; adding semi-sync cross-DC is a topology adjustment, not a new capability.

When not to use

  • Latency-sensitive workload where cross-DC RTT dominates commit time. The pattern is incompatible with sub-millisecond commit SLOs.
  • Small deployments (2-DC or single-region). The marginal cost is high and the semantic benefit is small — a full consensus protocol or a cross-region replica in a third site is often the better choice.

Seen in

Last updated · 378 distilled / 1,213 read