Skip to content

CONCEPT Cited by 1 source

Active/Standby Replication

Active/standby replication is a high-availability topology where one node (active / primary) serves all reads + writes, while a second node (standby / secondary) holds a replica of the state and is ready to be promoted on failure. Only one node is authoritative at any time; the standby does not serve traffic under normal operation.

Contrast with:

  • Active/active — both nodes serve traffic; requires conflict resolution or deterministic request routing.
  • Leader/follower with read-scaling — follower serves reads; but this is a primary-secondary semantic for read-path load-balancing, not an HA shape.

The defining property is that the standby is idle from a traffic-serving perspective, which simplifies: no conflict resolution, no split-brain on writes during normal operation, no read/write routing logic. The cost is half the fleet is burning capacity purely to be ready for failover.

Replication mechanic

The replication can be at several layers:

  • Block-level — every write to the active's disk is replicated synchronously or asynchronously to the standby's disk. DRBD is the canonical Linux kernel primitive. See concepts/synchronous-block-replication.
  • Filesystem-level — rsync, lsyncd; coarser granularity, usually asynchronous.
  • Database-level — log shipping, streaming replication (MySQL binlog, PostgreSQL WAL).
  • Application-level — custom event shipping, message queues.

Lower layers mean the application doesn't need to know about replication; higher layers mean the application can make semantic choices (e.g. deferred writes, retry idempotency).

Failover

A cluster manager (Pacemaker, Heartbeat, custom tooling) detects active-node failure and promotes the standby. Concerns:

  • Split-brain — both nodes believe they are primary. Prevented by quorum, fencing, or STONITH ("shoot the other node in the head") mechanics.
  • Fail-over latency — the detection + promotion gap during which neither node serves traffic.
  • Data freshness at promotion — with asynchronous replication, the standby may lag; failover loses in-flight writes. With synchronous replication (e.g. DRBD), every acknowledged write is on both nodes before ack.

Seen in

Last updated · 517 distilled / 1,221 read