Skip to content

CONCEPT Cited by 1 source

Primary-Replica Topology Alignment

Primary-replica topology alignment is the structural principle that the replication topology of your storage layer should mirror the write-ownership topology of the application layer that uses it. If the app has a strict primary (accepts writes) / replica (read-only) invariant, the storage layer's replication direction should be locked to match: primary-storage is leader, replica-storage is follower, and the storage layer must not independently rebalance write responsibility across them.

Symptom of misalignment: the stuck-replica class

Misalignment shows up as deadlocks between the app's primary/replica lifecycle and the storage layer's rebalancing logic. Canonical wiki instance: the pre-2026 GHES search topology.

  • GHES runs one Elasticsearch cluster spanning the primary GHES node and the replica GHES node.
  • ES is free to rebalance a primary shard onto either node — this is a normal cluster-health action, not a bug.
  • But GHES's application layer insists the replica node is read-only.
  • If ES has moved a primary shard to the replica node, then the replica is taken down for maintenance, the replica waits for ES-cluster health before starting up, but ES can't become healthy until the replica rejoins — mutual-dependency deadlock.

The structural root cause is that the storage layer's freedom to rebalance write-owning shards is wider than the application layer's invariant allows. No amount of mitigation at the operational layer (health-check gates, drift-correction, in-house search-mirroring) can fix this without constraining the storage layer to stay inside the app's invariant. (Source: sources/2026-03-03-github-how-we-rebuilt-the-search-architecture-for-high-availability)

Structural fix: make the storage leader/follower match the app leader/follower

The GHES rewrite fixes this by moving each node into its own single-node ES cluster and wiring them up with Cross Cluster Replication (CCR). Now the storage layer has no rebalancing freedom at all: the leader cluster lives on the primary node and the follower cluster lives on the replica node. By construction, no primary shard can ever land on the replica node — not because of a policy, but because there is no cluster spanning the two nodes anymore. See patterns/single-node-cluster-per-app-replica.

This is stronger than adding a constraint to the existing cluster: it eliminates the mechanism that could violate the invariant, rather than adding a guard that could be misconfigured or defeated by a future rebalancing heuristic.

When alignment is not the right answer

  • When the app has no primary/replica invariant — stateless services fronting a distributed store don't need alignment; they already delegate ownership to the storage layer.
  • When horizontal scale of storage is the hard requirement — a strict single-node-per-app-replica pattern foregoes the storage cluster's ability to shard an index across multiple machines, capping the per-replica capacity. Alignment is appropriate when the app's ownership invariant is the constraint; it's a trade-off when raw storage scale is.
  • When the app already tolerates storage-level rebalancing — some systems (e.g. a search product that reads stale-but-live results) don't care if the storage layer moves writers around, as long as reads land eventually.
  • concepts/control-plane-data-plane-separation — orthogonal axis: "decide" vs "deliver" separation. Topology alignment is specifically about the replication-direction axis of the data plane, not the decide-vs-deliver split.
  • concepts/split-brain — the class of failure that alignment prevents by making it impossible for two nodes to independently believe they own the write path.
  • Segment-level replication — the how (what replicates between aligned clusters) is orthogonal; GHES uses Lucene-segment CCR, but the alignment principle is independent of replication grain.

Seen in

Last updated · 200 distilled / 1,178 read