Skip to content

REDPANDA 2026-04-21

Read original ↗

Redpanda — Me and my shadow (link!): Disaster recovery replication made easy

Summary

Redpanda (unsigned, 2026-04-21) publishes the mechanism + performance + reciprocal-architecture deep-dive on Shadow Linking — the feature the 25.3 launch post introduced at preview-altitude and this post walks at mechanism altitude. Five load-bearing new disclosures the 25.3 post did not make: (1) the shadow cluster runs per-broker replication tasks that read directly from source brokers over the standard Kafka API in a fully distributed shape — canonicalising parallel broker replication tasks as a shared-nothing property that scales replication throughput linearly with broker count up to the network limit; (2) production-grade scale numbers: scale-tested at 2.5 GiB/s source throughput / 2.5 million msg/s with a total-topic lag consistently under 10,000 messages → effective RPO of ~4 ms on average — first disclosed per-feature RPO number for Shadowing; (3) reciprocal active-passive architecture via parallel shadow links on both clusters, with the a_ / b_ topic-name- prefix convention to signal source cluster and avoid naming collisions — canonicalising concepts/reciprocal-active-passive-clusters and patterns/reciprocal-active-passive-via-parallel-shadow-links; (4) failover granularity — you can fail over "by topic or entirely", so an app-level outage only fails over individual topics while a region-level outage fails the whole link; canonical concepts/per-topic-granularity-failover + patterns/topic-level-granular-dr-failover; (5) link-deletion semantics"You can only delete a shadow link once all of the flows are failed over and there are no active replication flows. This is A Good Thing™." — a safety property absent from the 25.3 post. Also makes explicit the MirrorMaker2 hardware-cost contrast that the 25.3 post only implied: MM2 needs "another cluster to host the replication workload" on top of source + sink; Shadow Linking runs inside the existing brokers. Tier-3 clear — this is substrate-mechanism content, not launch PR, with genuine novel scale + architecture disclosures.

The post is the canonical Shadow Linking mechanism reference on the wiki — the 25.3 launch post is the what + why, this post is the how + how fast + and also for active-active. Paired reading.

Key takeaways

  • Replication runs as per-broker tasks reading the source over the Kafka API. Verbatim: "A shadow link is defined within the shadow cluster and creates tasks internal to the broker that read data from the source cluster and write it locally. These tasks read data from the source using the standard Kafka API." And later: "Each broker in the shadow cluster runs replication tasks that read directly from the brokers in the source cluster, enabling massively parallel data transfer. This fully distributed approach provides excellent throughput and allows you to scale replication capacity simply by adding more brokers, up to the limit of your network." Canonical wiki concept: concepts/parallel-broker-replication-tasks — the replication mechanism inherits Redpanda's thread-per-core shared-nothing scaling property: throughput scales linearly with broker count until network saturates. This is the structural advantage over MirrorMaker2, whose Connect cluster can be scaled independently but is a separate system with its own capacity-planning surface.

  • Production scale: 2.5 GiB/s / 2.5M msg/s → ~4 ms effective RPO. Verbatim: "As an illustration of the performance, I recently scale-tested shadowing, driving the source cluster at 2.5 GiB/s. During that experiment, I was able to replicate with a total lag (across all topics) that was consistently lower than 10,000 messages — on a workload producing 2.5 million messages per second — giving us an effective RPO of around 4 milliseconds on average." Canonical wiki datum: Shadow Linking sustains 2.5 GiB/s with <10k msg total-cluster lag → 4 ms RPO on average. This is the first disclosed per-feature RPO number — the 25.3 post said only "RPOs measured in a few seconds", which is the SLA ceiling; the measured case is two orders of magnitude better. Canonicalises replication-lag-in-messages as an RPO measurement dimension complementary to the time-based view: (message lag) / (message throughput) = RPO in wall-clock is the computation. At 2.5M msg/s, every 10k-msg reduction in lag shrinks RPO by 4 ms.

  • Scales vertically AND horizontally inside the broker. Verbatim: "Shadow linking also scales naturally with the cluster, both vertically and horizontally. If you use bigger nodes with more cores, Redpanda's internal shared-nothing architecture can use that to its fullest. If you scale out the cluster and add more nodes, we will use them to increase the shadowing parallelism, all without you needing to tune anything out of the box." Two-axis scaling property: vertical via thread-per-core inside bigger nodes; horizontal via adding brokers. No tuning required — the shadow-link configuration stays the same, the broker fleet's Seastar runtime absorbs the extra cores/nodes into the existing parallelism framework. This is the structural reason "simplicity always has an outsized payoff" (verbatim) — the operator doesn't have to plan replication capacity separately from cluster capacity.

  • Reciprocal active-passive via two parallel shadow links. Verbatim: "Running a reciprocal active-passive cluster pair is as simple as configuring two shadow links — one on each cluster. This design benefits from using a consistent prefix to name topics and consumer groups, identifying their source site. In the example above, the prefixes of a_ and b_ in the topic names indicate which cluster they originate in." Canonical wiki concept: concepts/reciprocal-active-passive-clusters — two clusters where each is simultaneously a source cluster (for its own data) and a shadow (for the other's data), achieved by configuring one shadow link on each side. The post frames this as "This kind of reciprocal active-passive architecture, in which both clusters are active and usable, can still be achieved with parallel shadow links." Canonical pattern: patterns/reciprocal-active-passive-via-parallel-shadow-links. Distinction from true active-active (multi-writer on the same topic): each topic still has a single writer — the cluster that owns it — so there's no write-conflict problem. The "active-active-ish" behaviour at the aggregate workload level comes from each cluster accepting local writes for its own prefixed topics while replicating the other cluster's topics as read-only shadows.

  • Topic-name-prefix convention as namespace-and-operator-signal. Verbatim: "While not strictly necessary, the name prefixing is helpful for multiple reasons: reduces the likelihood of topic/group naming clashes between sites; simplifies shadow link configuration (topics and groups can be selected for replication on the basis of the prefix rather than needing a static list of topics and groups); helps operators know at a glance which site a topic originates from." Canonical wiki concept: concepts/topic-prefix-namespacing-convention — the a_ / b_ (or equivalent) prefix on topic names and consumer-group names that encodes origin cluster in the name itself. Three load-bearing benefits: (1) collision avoidance (same logical topic on both clusters without ambiguity), (2) prefix-based shadow-link configuration (the link config is a prefix match, not a topic enumeration — which means new topics are automatically included as they are created), (3) operator observability (a topic name alone tells you its origin cluster).

  • Schema-registry replication via _schemas topic. Verbatim: "The _schemas topic can be replicated when the feature is enabled, allowing schemas (and schema settings, such as compatibility) to be replicated." And in the reciprocal architecture: "a primary site for schema registry would need to be chosen (since both sites will use _schemas)." Canonical disclosure: Redpanda's schema registry is stored in a Kafka topic named _schemas, and Shadow Linking replicates that topic when replicate_schemas (or equivalent flag) is enabled — which makes schema registry replication just-another-topic rather than a separate subsystem. In the reciprocal active-passive shape, only one cluster can own _schemas at a time because both sites would write to the same topic name; the post names this explicitly as a constraint on the reciprocal architecture. Contrast with MM2 where schema-registry replication is a separate connector that must be operated independently.

  • Failover granularity: by-topic OR by-whole-link. Verbatim: "Keep in mind that if you have an app-level outage, you don't need to failover the whole link — just failover individual topics as needed." Canonical wiki concept: concepts/per-topic-granularity-failover — the DR primitive is not just failover(link) but also failover(topic, link). The operational consequence is significant: an app-level outage (one service's topic family breaks but the cluster is fine) can be failed over at topic granularity while leaving the rest of the link intact; a region-level outage (whole source cluster gone) is still a whole-link failover. This gives operators two tools matched to two outage types rather than forcing every DR event through a single mechanism. Canonical pattern: patterns/topic-level-granular-dr-failover — a DR pattern that specifically provides sub-cluster failover granularity, compared to traditional stretch-cluster DR where the minimum failover unit is the whole cluster.

  • Failover unlocks writability; consumer resumption is automatic. Verbatim: "When you failover a link, either by topic or entirely, the replication flows stop and the linked topics will become writable to regular producers. At this point, you can migrate your consumers and producers by reconfiguring them to point directly to the shadow cluster instead of the source cluster and continue where you left off." Offset preservation (canonicalised by the 25.3 post at concepts/offset-preserving-replication) makes "continue where you left off" literal: consumers resume at the same committed offset they held on the source. The post also references the Redpanda failover runbook as the companion operational document — "Definitely one to keep bookmarked!".

  • Link deletion requires all flows failed-over or drained. Verbatim: "You can only delete a shadow link once all of the flows are failed over and there are no active replication flows. This is A Good Thing™." Canonical safety property: the system prevents destructive link-delete on an active replication by making it a pre-condition that all flows be either failed-over (writable-on-shadow) or inactive. This is a guardrail against the operator-error shape where someone deletes a link thinking it's unused, leaving consumers suddenly pointed at a cluster that's no longer being fed data. No such guardrail is named in the 25.3 post.

  • Source cluster is unaware of the replication. Verbatim: "A shadow link is configured only on the destination cluster only. The source cluster is completely unaware of the link, aside from the additional read workload it sees." Architectural consequence: the source cluster has no config to add, no operator action required — a shadow cluster can be stood up unilaterally by the DR operator. The source sees Shadow Linking as just another Kafka consumer (with the additional fetch load that implies). This is a stronger isolation property than MM2, which depends on per-connector auth against the source cluster (still one-sided) but still has a named "source-side" configuration component in the Connect cluster. Shadow Linking's config is entirely on the shadow.

  • Shadow topics are read-only to regular producers until failover. Verbatim: "While the client of a shadow link is writing to a topic, that topic is read-only to all other producers, ensuring that the topic stays in sync with the source and doesn't diverge in contents. It will only become writable once failed over." Canonical safety property: the broker enforces the invariant that only one writer can produce to a shadow topic at a time (the shadow-link task), preventing the split-brain shape where a misconfigured producer writes to a shadow topic while it's still being replicated. The enforcement is at the broker/topic level, not client-side discipline — you can't accidentally write to a shadow topic.

  • MirrorMaker2 needs a separate cluster; Shadow Linking does not. Verbatim: "Consider replicating a stream of messages at 1GiB/s using an external tool such as MirrorMaker: In addition to the source and sink clusters, you would need another cluster to host the replication workload. In contrast, when using shadowing, no additional hardware is needed." Canonical hardware-cost disclosure: MM2 is a three-cluster architecture (source + sink + Connect cluster); Shadow Linking is a two-cluster architecture (source + shadow). At 1 GiB/s this is the difference between 2× and 3× the single-cluster hardware budget — a ~50% additional infrastructure cost for the connector-based shape. The 25.3 post only said MM2 was "an external service" without quantifying the third-cluster cost; this post does.

  • MM2 can't guarantee replicated-data fidelity — duplicates are not uncommon. Verbatim: "Worse still, external replication tools can't guarantee the fidelity of the replicated data; it's not uncommon for duplicate messages to be introduced by the replication layer. In other words, not only is the external tool approach more expensive, but it also yields a worse outcome. (Lose-lose, anyone?)" Canonical fidelity distinction: because MM2 consumes-and-re-produces via the public Kafka API, the at-least-once semantics of MM2's own consumer-side checkpointing can produce duplicate writes on the destination (after a consumer restart). Shadow Linking, running at the broker's log-layer, doesn't have this re-produce-after-restart shape — offset-preserving replication is byte-for-byte.

  • Performance via C++ / no JVM tuning. Verbatim: "Just like the rest of the Redpanda broker, the Shadowing components are written in high-performance C++, which means that not only do you get great replication performance, but there's also no Kafka Connect and no JVM tuning in sight. Woohoo!" Re-statement of Redpanda's core differentiation applied to the replication substrate: same C++/Seastar runtime, same no-GC-pause throughput profile, now on the DR data path too.

  • Rollout-as-routine failover — practicable DR drills. Verbatim: "This simplicity means that failover isn't something to fear, but something that can become routine. By practicing failover, teams can provide verifiable evidence of their disaster recovery readiness. Having high confidence in your preparedness (based on demonstrated capability) is infinitely more useful than the usual hopeful assumptions." Links Shadow Linking's failover primitives (particularly per-topic granularity) to the broader always-be-failing-over drill discipline canonicalised by PlanetScale's extreme-fault-tolerance corpus — DR readiness as demonstrated capability rather than hypothetical. Shadow Linking's topic-level failover granularity makes small scheduled drills feasible (fail over one topic, exercise the consumer-reconfiguration playbook, fail it back) without the "whole cluster goes down" blast radius of a full-link drill.

  • Observability via Prometheus-compatible metrics and Redpanda Console. Verbatim: "Prometheus-compatible metrics to see the link status, including replication lag, are published by the broker, so your existing monitoring will automatically pick them up." And: "The state of the link and the replication flows it is handling can be viewed in console, via rpk and via REST, allowing easy understanding and integration." Three-surface observability: Prometheus (automated), Redpanda Console (interactive GUI), rpk + REST (scripting). The 25.3 post named "Monitoring a shadow cluster in Redpanda Console" only — this post expands to the full multi-surface observability story.

  • Five replication axes in-cluster. Verbatim breakdown: "Topic data: All records are replicated byte-for-byte, preserving offsets, timestamps, headers compression, and batching." + "Topic configurations: This includes the partition count and topic properties such as retention, compression, and cleanup policy. Not all properties are replicated." + "Consumer group data: Committed offsets and group membership, enabling failover of consumers." + "ACLs / security policies: Access control lists are replicated to ensure consistent authorization across clusters." + "Schema registry data: The _schemas topic can be replicated when the feature is enabled." Five axes: data / configs / consumer-groups / ACLs / schemas — what the 25.3 post called "topics, configs, consumer group offsets, ACLs, schemas — the works!" at list-altitude, this post enumerates with one-sentence mechanism.

Operational numbers

  • 2.5 GiB/s — source cluster throughput under scale-test.
  • 2,500,000 msg/s — source cluster message rate under scale-test.
  • <10,000 messages — total-cluster replication lag across all topics under the above load.
  • ~4 ms — effective RPO on average at the above lag + throughput.
  • 1 GiB/s — the MirrorMaker2 comparison case: three clusters (source + sink + Connect) vs two for Shadow Linking at the same workload.
  • 2 clusters — minimum Shadow Linking architecture (source + shadow).
  • 3 clusters — minimum MirrorMaker2 architecture (source + sink + Connect).
  • 5 replication axes — data + configs + consumer-groups + ACLs + schemas — enumerated per-axis.
  • 2 shadow links — the count needed for a reciprocal active-passive pair (one on each cluster).

Caveats

  • No mechanism-altitude walkthrough of the replication-task internals. The post says "tasks internal to the broker that read data from the source cluster and write it locally" and "Each broker in the shadow cluster runs replication tasks that read directly from the brokers in the source cluster" but does not walk: the task scheduling model, per-task-per-partition mapping, how the task writes records into the log layer while bypassing the public producer API (which would re-assign offsets), or the backpressure / retry / catch-up semantics when the source is faster than the shadow. These are broker-internal implementation details the post defers to Redpanda documentation.

  • The 2.5 GiB/s scale-test is unsigned. The post says "I recently scale-tested shadowing" in first-person singular but the post carries no author attribution. The benchmark result is a single data point without hardware configuration, network topology, region separation, or cluster-size disclosure. For a production-capacity planning exercise this number is a qualitative floor (Shadow Linking can go this fast on some hardware) rather than a quantitative SLA (here's what you'll get on your hardware).

  • Reciprocal active-passive is the closest thing to active-active the post discusses — but it's still active-passive at the per-topic level. The post's "both sites support simultaneous producing and consuming" is true at the aggregate workload level (each site produces to its own prefixed topics + consumes from the other's shadowed topics), but each topic still has a single writer — the cluster that owns it. This is not the same as multi-writer active-active (where two clusters both accept writes to the same logical topic and reconcile conflicts), which Redpanda does not support. The post doesn't explicitly disambiguate this — operators reading quickly may conflate reciprocal active-passive with true active-active.

  • Schema-registry replication in reciprocal architecture requires picking a primary site. Verbatim: "a primary site for schema registry would need to be chosen (since both sites will use _schemas)." This means the reciprocal architecture has an asymmetry at the schema-registry layer that the topic layer does not have — both sites can own prefixed topics simultaneously, but only one can own _schemas. The post flags this as a constraint without walking the operational consequences (e.g. what happens if the primary-schema-registry site is the one that fails over — does the _schemas topic need to be failed over by itself to the other site?).

  • "Not all properties are replicated" — topic configurations are partially replicated, with the excluded set deferred to the documentation. The post links to the docs page but names no excluded properties in the body. Operators need to read the docs to know what gaps exist in the config-replication story.

  • Schemas are off by default. Verbatim: "Schemas aren't replicated by default, but this is easily enabled when configuring a link." A default-off DR-critical property is a footgun if the operator doesn't know to enable it — the first time they know schema registry didn't replicate is during DR failover when their consumers can't deserialize records.

  • Failover-runbook link is an external docs reference. The operational playbook for a real DR event lives at docs.redpanda.com; the blog post names it but doesn't walk its contents. This is appropriate (the runbook's detail doesn't belong in a marketing post) but means the blog is insufficient on its own for a team implementing DR.

  • Link deletion safety is stated but not walked. "You can only delete a shadow link once all of the flows are failed over and there are no active replication flows" is the invariant, but the post doesn't describe what error the operator sees on an attempted delete of an active link, or whether there's a force-delete path for emergency scenarios. Operators running cleanup automation need to know the API contract.

  • acks semantics on replicated data are implicit. The post says "Replication is asynchronous: As your upstream producers write messages to the source cluster, the acknowledgments they receive only indicate that those messages are durably written to that source cluster — not that the messages are also replicated to the shadow cluster." This is a crucial property for RPO reasoning — producer acknowledgment does not imply shadow-cluster durability — but the post doesn't walk the producer-side framing (no acks=all-and-replicated option, because such a thing would defeat the async-replication latency win).

  • Tier-3 launch-adjacent post. The post closes with "we have options for you: demo / documentation / community Slack" — the marketing close is restrained (one paragraph) and the architecture content is ~60-70% of the body, so it clears AGENTS.md's "architecture content ≥20% of the body" bar decisively. But it is Redpanda promoting its own feature, not a post-incident or third-party validation, and the 2.5 GiB/s benchmark is a Redpanda-run test.

  • No contradiction with 25.3 launch post. All disclosures are strictly additive or mechanism-detail extensions of the 25.3 preview framing. The 25.3 post's "RPO … measured in a few seconds" is the SLA; the 2.5 GiB/s scale test's 4 ms average RPO is the measured-case — consistent but two orders of magnitude better. No load-bearing claim from 25.3 is contradicted.

Source

Last updated · 550 distilled / 1,221 read