PATTERN Cited by 2 sources

Hot-standby cluster for DR¶

Pattern¶

Run a continuously-up secondary cluster in a different region / failure domain, receiving async replication from the primary. The secondary is functional — "a fully functional, hot-standby clone" — ready to accept reads and writes within seconds of failover.

Hot-standby is the high-availability / low-RPO / low-RTO end of the DR tier ladder. It sits between:

Warm standby — secondary is running but scaled down; requires scale-up before full traffic can fail over.
Active-active / stretch cluster — two datacenters serving the same workload with sync replication; RPO=0 at the cost of per-write cross-region RTT.

A hot-standby is async-replicated and full-scale — it can take over immediately (within client timeouts), with a bounded RPO (= replication lag).

Canonical instance¶

Redpanda Shadowing (25.3, 2025-11-06) is the first wiki instance of the pattern on the streaming-broker substrate.

Canonical verbatim from the sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more|25.3 launch post:

"Shadowing creates a fully functional, hot-standby clone of your entire Redpanda cluster — topics, configs, consumer group offsets, ACLs, schemas — the works!"

"When disaster strikes, you're not restoring from a day-old backup. You're failing over to a clone that's seconds behind production."

Shadowing composes hot-standby with offset preservation and broker-internal replication to deliver seconds-RPO / seconds-RTO without a Kafka Connect operational layer.

Why hot-standby over other DR shapes¶

Hot-standby is the right answer when all three hold:

RPO budget is seconds, not minutes. Backup/restore and pilot-light fail this test.
RTO budget is seconds, not minutes. Warm standby with scale-up fails this test (scale-up takes minutes).
Latency-critical writes preclude sync replication. Stretch clusters fail this test when cross-region RTT exceeds the write SLA.

Hot-standby pays 2× cluster cost (idle secondary + primary) plus replication bandwidth, in exchange for continuous readiness. It's the most expensive DR shape short of active-active.

Critical mechanics¶

A hot-standby that's merely running is not enough. The pattern works when the standby is:

Fully replicated. All data, all configs, all ACLs, all schema registrations, all consumer-group offsets — not a data-only clone. At failover you don't want to wait for config recreation.
Functional. The standby must be able to accept traffic immediately — same bootstrap-URL shape, same auth flow, same client API surface. No "promote the standby" intermediate step that adds minutes.
Lag-monitored. Because the replication is async, the operator must continuously watch replication lag to know the current RPO. "Monitoring a shadow cluster in Redpanda Console" is how Shadowing surfaces this.
DR-drill-capable. Test failover regularly so the switchover procedure is known good. Verbatim from the launch post: "create a shadow link, monitor lag and throughput on your shadow cluster, and run a DR drill."

Anti-patterns¶

Hot standby without config replication — you have the data but not the ACLs / schemas / consumer-group offsets. Failover means hours of reconfiguration. Shadowing specifically names "topics, configs, consumer group offsets, ACLs, schemas — the works!" as a full hot-standby.
Hot standby without offset preservation — consumers can't resume at the same offsets; the failover mechanics add an offset-translation step. See concepts/offset-preserving-replication.
Hot standby without regular drills — the standby drifts from the primary configuration over time; the assumed seconds-RTO silently becomes an hours-RTO.

Seen in¶

sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more — canonical wiki source on the streaming-broker substrate. Redpanda Shadowing provides the first wiki instance of a full hot-standby streaming cluster feature (topics, configs, offsets, ACLs, schemas) with broker-native offset preservation.
sources/2026-04-21-redpanda-me-and-my-shadow-link-disaster-recovery-replication-made-easy — Shadow Linking mechanism + performance deep-dive. Scale-validates the pattern at 2.5 GiB/s / 2.5 M msg/s / <10k msg lag / ~4 ms RPO. Also introduces two refinements: (1) per-topic failover granularity — the DR primitive is not only failover(link) but also failover(topic, link), matching app-level outages with app-level failover scope; and (2) reciprocal active-passive via two parallel shadow links, turning the normally-idle secondary into a productive cluster for its own topic family. Per-topic failover composes with always-be-failing-over drill discipline to make DR exercises operationally feasible at much higher cadence than whole-link drills.

systems/redpanda-shadowing — canonical instance.
systems/redpanda — the broker.
systems/kafka — the upstream project that does not ship an equivalent hot-standby feature; MM2 is connector-based replication but not a full hot-standby clone.
concepts/rpo-rto — the DR budget dimension hot-standby targets.
concepts/asynchronous-replication — the replication mode that makes the shape affordable.
concepts/offset-preserving-replication — the consumer-simplification property Redpanda adds on top.
concepts/broker-internal-cross-cluster-replication — the architecture Redpanda uses to implement it.
concepts/mirrormaker2-async-replication — the connector-based shape that's a partial hot-standby (data but not offsets).
concepts/multi-region-stretch-cluster — the sync- replication alternative.
concepts/disaster-recovery-tiers — the ladder hot-standby sits at the high-RPO / high-cost end of.
patterns/offset-preserving-async-cross-region-replication — Redpanda Shadowing's specific composition.
patterns/warm-standby-deployment — the lower-cost, scale-up-required neighbour.
patterns/pilot-light-deployment — the even-lower-cost, cold-start neighbour.
patterns/async-replication-for-cross-region — the broader replication-mode pattern family.
patterns/topic-level-granular-dr-failover — the refinement that exposes per-topic failover granularity on top of the hot-standby shape.
patterns/reciprocal-active-passive-via-parallel-shadow-links — the refinement that gets both clusters doing real work.
patterns/always-be-failing-over-drill — the DR-discipline pattern per-topic failover enables at higher cadence.
concepts/per-topic-granularity-failover — the underlying primitive.
concepts/replication-lag-message-count — the native-to- broker lag measurement dimension.
concepts/reciprocal-active-passive-clusters — the two-cluster architecture.