Skip to content

PATTERN Cited by 1 source

Leader-based partition replication

Pattern

Replicate an append-only log across N replicas with exactly one leader at a time per replica set; writes go only to the leader; the leader asynchronously replicates to N-1 followers; on leader failure, a fresh leader is elected from the in-sync subset of followers by an external coordinator (Kafka's Controller).

Canonical production instance: Apache Kafka partition-replicas. Kozlovski's Kafka-101:

"The replication is leader-based, which is to say that only a single broker leads a certain partition at a time. […] Writes can only go to that leader, which then asynchronously replicates the data to the N-1 followers." (Source: sources/2024-05-09-highscalability-kafka-101)

Composition shape

  • Writes go to the leader only (no multi-master conflict).
  • Reads can be served from any replica in the network topology closest to the consumer (since Kafka 2.4+).
  • Durability is computed as a function of the ISR via producer acks:
  • acks=0 — fire-and-forget.
  • acks=1 — ack on leader disk-persist.
  • acks=all — ack when the whole ISR has persisted.
  • min.insync.replicas — fail-closed threshold below which acks=all writes are rejected rather than silently downgraded.
  • Leader election is owned by the cluster's active Controller (pre-3.3: ZooKeeper-elected; 3.3+: KRaft-elected via the __cluster_metadata Raft quorum).

Why this pattern beats symmetric replication

  • Simpler than multi-master / quorum-write protocols (no conflict resolution).
  • Cheaper at write time — only one replica processes the write hot path.
  • Strict ordering per partition — there's a single writer, so per-partition log offsets remain monotone and consumers observe records in produce order.
  • Clear durability semantics — composable with acks + min.insync.replicas.

Trade-offs

  • Write throughput limited by the leader. The leader is the hot path; scaling writes requires more partitions (more leaders), not bigger leader brokers.
  • Leader failover has a latency tail. The Controller detects failure, runs an election, clients retry against the new leader. Kozlovski: "the Controller reacts by gracefully switching the partition leadership to another broker in the replica set."
  • Availability vs durability dial. acks=all + min.insync.replicas=N can refuse writes when the ISR is too small — intentional trade-off (fail-closed on durability).

Seen in

  • sources/2024-05-09-highscalability-kafka-101 — canonical wiki statement of the leader-based-partition-replication pattern in Kafka, including the writes-only-to-leader / async-replicate / Controller-elects-leader shape.
Last updated · 319 distilled / 1,201 read