CONCEPT Cited by 1 source

Multi-region stretch cluster¶

Definition¶

A multi-region stretch cluster is a deployment topology in which a single streaming broker cluster (or database cluster) spans two or more geographic regions, with per-partition replica groups distributed across regions and replicated synchronously via a quorum-based consensus protocol — typically Raft. Unlike running two independent clusters with asynchronous cross-cluster replication (MirrorMaker2, logical DR replication, etc.), a stretch cluster presents as one logical cluster with one control plane and one client-facing bootstrap endpoint.

The canonical property is RPO = 0 against a region-level outage: a write is only acknowledged after a quorum of replicas — spanning multiple regions — has persisted it, so any acknowledged write survives the loss of a minority region. Leader election from a surviving in-sync replica is automatic via the consensus protocol.

Canonical Redpanda framing¶

"A multi-region Redpanda cluster is a deployment topology that allows customers to run a single Redpanda cluster across multiple data centers or multiple cloud regions. It's often referred to as a stretch cluster, where a single cluster stretches across multiple geographic regions with data distributed across all deployment regions. Data is replicated synchronously via raft protocol between brokers distributed across multiple regions and also accessible from various points globally."

"Unlike in asynchronous replication, where you have two separate clusters with MirrorMaker2 replication between them and a non-zero RPO, multi-region clusters have RPO=0 and very low RTO when there is a region-level outage. This is because new leaders are automatically elected — as part of the Raft protocol — in the surviving regions when any region goes down. The replication factor on the cluster or topics tells you how many region failures can be tolerated for the cluster to continue to serve the application layer."

(Source: sources/2025-02-11-redpanda-high-availability-deployment-multi-region-stretch-clusters)

Trade-offs¶

A stretch cluster pays cross-region RTT on every quorum write — this is the per-write cost of RPO = 0. The producer-side dial is acks (concepts/acks-producer-durability): acks=all on a stretch cluster means every ack requires cross-region quorum; acks=1 collapses the wait to leader-only at the cost of durability-on-region-loss.

Dimension	Stretch cluster (single cluster, sync)	MM2 async (two clusters)
RPO (region loss)	0	Non-zero (= replication lag at outage)
RTO (region loss)	Low (automatic Raft re-election)	Seconds–minutes (client reconnect to other cluster + optional manual promotion)
Write latency	Cross-region RTT per `acks=all` write	Local-region RTT
Cross-region bandwidth	Full replication factor × write volume per region pair	One replication stream per mirrored topic
Control plane	Single — one `rpk`, one config surface	Two — independent per cluster
Operational complexity	Lower (one cluster to run)	Higher (two clusters + MM2 infra)
Consistency	Strong (Raft quorum)	Eventual (per-cluster strong, cross-cluster eventual)

The Redpanda post frames this as the canonical consistency-availability axis:

"Raft ensures strong consistency in achieving quorum during writes in a multi-region setup. This ensures the maximum replicas across all regions have the same data simultaneously, which can increase latency. If strong consistency is not an absolute requirement but availability is, at the expense of slightly older data, multiple independent Redpanda clusters across different regions with MM2 replication can be set up."

Composing with optimisations¶

The stretch-cluster shape composes with four operator knobs that exist because cross-region quorum is expensive:

Leader pinning — pin partition leadership to the client-proximal region; eliminates the cross-region write-side hop when clients are regionally localised. Enterprise feature on Redpanda.
acks=1 — skip the quorum wait; durability degrades to leader-only. Tunable per producer session.
Follower fetching — consumers read from the closest replica rather than the leader; eliminates cross-region read-side hops. Kafka-API KIP-392 equivalent.
Remote read replica topic — read-only mirror served from object storage via a separate cluster; scales read fan-out without loading the origin cluster's brokers.

Rack awareness — region as rack¶

Redpanda's stretch-cluster deployment uses the rack-awareness machinery from the multi-AZ shape, with rack set to the region identifier rather than the AZ identifier. Three-broker example from the post:

34.110.102.41  rack=us-west-2
35.236.35.49   rack=us-east-2
35.114.39.36   rack=eu-west-2

Same rack-placement algorithm as multi-AZ (concepts/availability-zone-balance), different cardinality of the rack dimension.

Replication-factor tolerance-calibration¶

On a 3-region stretch cluster with replication factor 3 (one replica per region), the cluster tolerates loss of one region while remaining writable (Raft quorum = 2 of 3). Replication factor 5 on 3 regions means multiple replicas per region and more complex region-loss tolerance (depends on how the scheduler places replicas across regions and whether any one region has a majority of replicas). The post names this calibration dial verbatim but does not enumerate the placement algorithm.

Where stretch clusters do not apply¶

Transoceanic single-cluster stretch is usually impractical: 150+ ms round-trip latency (Tokyo ↔ London, Sydney ↔ us-east-1) on every acks=all write breaks most OLTP write SLAs. Regional stretch (e.g., us-east-1 ↔ us-west-2 ↔ eu-west-1 in a Raft quorum) is typical; transoceanic stretch is rare and usually reserved for very-low-write-rate state or non-latency-sensitive archival paths.
Redpanda K8s deployment gap (as of the post's publication): "Self-Managed on K8s currently supports only multi-AZ deployments in all the cloud providers." Multi-region stretch is available on VM / bare-metal / cloud-compute instances and on Redpanda Cloud Dedicated + BYOC only.

Seen in¶

sources/2025-02-11-redpanda-high-availability-deployment-multi-region-stretch-clusters — canonical wiki statement of the multi-region-stretch-cluster shape; RPO=0 + Raft re-election property; five-hazard catalogue (latency, replication overhead, cross-region bandwidth, routing, consistency-availability); four mitigations (leader pinning, acks=1, follower fetching, remote read replica); Ansible region-as-rack deployment template; OMB + tc simulation technique.

systems/redpanda, systems/kafka
concepts/rpo-rto — RPO=0 is the load-bearing DR property.
concepts/strong-consistency — the consistency side of the trade-off.
concepts/in-sync-replica-set — the ISR spans regions on a stretch cluster.
concepts/leader-follower-replication
concepts/acks-producer-durability — the producer durability dial evaluated against cross-region quorum.
concepts/leader-pinning, concepts/follower-fetching, concepts/remote-read-replica-topic — the three canonical latency-and-cost mitigations.
concepts/mirrormaker2-async-replication — the async-mirror alternative shape.
concepts/cross-region-bandwidth-cost — the operational cost hazard.
patterns/multi-region-raft-quorum — the replication pattern this concept names.
patterns/async-replication-for-cross-region — the MM2 alternative pattern.