Skip to content

CONCEPT Cited by 1 source

Reciprocal active-passive clusters

Definition

Reciprocal active-passive clusters is a multi-region streaming architecture where two clusters each act simultaneously as a source cluster for their own data and as a shadow cluster for the other's data, achieved via two parallel unidirectional shadow links (one in each direction). At any point in time:

  • Cluster A writes to topics A owns (e.g. a_* prefix).
  • Cluster B writes to topics B owns (e.g. b_* prefix).
  • Cluster A shadows all of cluster B's topics (read-only until failover).
  • Cluster B shadows all of cluster A's topics (read-only until failover).

Each topic is still active-passive at the topic granularity — exactly one cluster writes to it, the other read-replicates it. The aggregate workload is bidirectional because each cluster produces and consumes for its own topic family while mirroring the other's.

This is not active- active replication: two clusters never accept writes to the same logical topic. There is no write-conflict problem because each topic has a single writer by construction.

Canonical wiki source

Introduced by the 2026-04-21 Redpanda Shadow Linking deep-dive:

"Some organizations may wish to distribute risk by having shadow clusters serve primary roles to get the most out of their infrastructure. […] This kind of reciprocal active-passive architecture, in which both clusters are active and usable, can still be achieved with parallel shadow links."

"Running a reciprocal active-passive cluster pair is as simple as configuring two shadow links — one on each cluster. This design benefits from using a consistent prefix to name topics and consumer groups, identifying their source site."

"In this deployment architecture, each cluster acts as both a source cluster and a shadow at the same time."

"As you might expect with this architecture, both sites support simultaneous producing and consuming. Failover in either direction (from A to B, or B to A) is possible, making this design fault-tolerant to an outage in either location."

Architectural diagram (conceptual)

   Region A                       Region B
   ─────────                      ─────────
   Cluster A                      Cluster B
   ─────────                      ─────────
   a_topic_1  ──── shadow-link ──→  a_topic_1 (shadow)
   a_topic_2  ──── shadow-link ──→  a_topic_2 (shadow)
   b_topic_1 (shadow) ←── shadow-link ──── b_topic_1
   b_topic_2 (shadow) ←── shadow-link ──── b_topic_2

Two shadow links operate in parallel — one configured on Cluster B (reading from A), one configured on Cluster A (reading from B). Each shadow link is unidirectional; the reciprocal is achieved by running two of them in opposite directions simultaneously.

Why reciprocal (not separate-DR-cluster)?

The motivation from the launch post:

"Some organizations may wish to distribute risk by having shadow clusters serve primary roles to get the most out of their infrastructure."

Compared to the baseline shape — single source cluster in one region + idle hot-standby shadow in another — reciprocal active-passive gets both clusters doing real work:

  • Hardware utilisation — no idle DR cluster.
  • Lower local write latency per region — producers in each region write to their region-local cluster.
  • Failover in either direction — not just "DR region saves primary region" but "either region saves the other".
  • Workload distribution by region — consumers / producers that logically belong to region A write to cluster A; region B workloads write to cluster B.

The cost: operational complexity and schema-registry primary-site selection (see caveats below).

Consumer-side semantics

When running reciprocal active-passive, consumers that logically need to see all data (A's + B's topics) must subscribe to both local topics and the shadow copies of the remote cluster's topics. The 2026-04-21 post:

"Consuming messages is conceptually a little more complex, in that there are now two topics that need to be read by the same consumer group (local and shadow). In practice, this just means a little more configuration of the consuming client."

This is the load-bearing consumer-side consequence: a logical-global consumer group spans two topic families on its local cluster (the locally-written topics + the shadow copies of remote-written topics), not two clusters.

Schema-registry constraint

The 2026-04-21 post flags an explicit asymmetry:

"In addition, a primary site for schema registry would need to be chosen (since both sites will use _schemas)."

Redpanda stores its schema registry in a single Kafka topic named _schemas. Because both sites would otherwise want to own _schemas for their local schema registrations, only one site can be the primary for schema registry. The other site's schema registry becomes a read-replica via Shadow Linking of the _schemas topic.

Consequence: the reciprocal architecture is symmetric at the topic layer but asymmetric at the schema-registry layer. If the schema-registry primary site fails over, schema-registry failover is a distinct operation from topic failover.

Relationship to active-active

Reciprocal active-passive is not active-active multi-writer replication. The distinction:

Property Reciprocal active-passive True active-active (multi-writer)
Writer count per logical topic 1 (the owning cluster) 2+ (both clusters)
Write-conflict possibility None Requires conflict resolution (CRDT, LWW, etc.)
Topic naming scheme Prefixed by owner (a_*, b_*) Same name on both clusters
Offset space per topic Single-writer monotonic Interleaved or segregated
Shadow-link direction per topic Unidirectional N/A (or bidirectional with loop-prevention)

Reciprocal active-passive achieves aggregate-workload bidirectionality without ever needing the two clusters to agree on a write order for the same topic. The topic-naming convention is what enforces the single-writer-per-topic invariant.

Failover mechanics

Failover in reciprocal active-passive is symmetric: if Cluster A goes down, Cluster B fails over the a_* topics (they become writable on B), and producers/consumers for those topics reconfigure to B. The b_* topics stay on B unchanged. Reverse for Cluster B going down.

Because each shadow link is separately failable, the operator can also fail over specific topics — see concepts/per-topic-granularity-failover for the sub-link failover primitive — without affecting the other direction's flow.

Seen in

Last updated · 550 distilled / 1,221 read