Skip to content

CONCEPT Cited by 1 source

Remote read replica topic

Definition

A remote read replica topic is a read-only topic on a separate cluster that mirrors a topic on an origin cluster by reading the origin's tiered-storage / archival-storage segments directly from object storage (S3, GCS, Azure Blob) — bypassing the origin cluster's brokers entirely. Consumers subscribe to the remote cluster's topic; the remote cluster fetches segments from the shared object store; the origin cluster's broker fleet experiences no read load from the remote consumers.

This is the read-fan-out-decoupled-from-origin primitive on tiered-storage-capable streaming brokers like Redpanda.

Canonical Redpanda framing

"A Remote Read Replica topic is a read-only topic that mirrors a topic on a different cluster. It works with both Tiered Storage and archival storage."

"Remote Read Replicas allow you to create a separate remote cluster for consumers of a specific topic, populating its topics from remote storage. This can serve consumers without increasing the load on the origin cluster. These read-only topics access data directly from object storage instead of the topics' origin cluster, which means there's no impact on the performance of the original cluster. Topic data can be consumed within a region of your choice, regardless of where it was produced."

(Source: sources/2025-02-11-redpanda-high-availability-deployment-multi-region-stretch-clusters)

How it differs from follower fetching

concepts/follower-fetching optimises read-path locality by letting consumers read from the origin cluster's follower broker rather than leader. A remote read replica topic goes one step further: the consumer reads from a separate cluster entirely, backed by the shared object store.

Follower fetching Remote read replica
Cluster Same origin cluster Separate remote cluster
Data source Origin follower broker Object storage (S3/GCS)
Origin broker load Reduced (reads go to followers) Zero
Staleness Replica lag (ms) Object-storage upload interval (seconds)
Scale-out ceiling Bounded by replication factor Unbounded — more read clusters, more read throughput
Cross-region application Yes, but still loads origin cluster Yes, and isolates origin from remote reads entirely

The architectural difference is read-load scale-out vs scale- up. Follower fetching scales reads across the origin's existing follower brokers but does not add read throughput beyond the cluster's own capacity. Remote read replica adds a separate read cluster in the read-heavy region, scaling read fan-out without adding origin brokers.

Architectural substrate: tiered storage

Remote read replica is built on tiered storage — the origin cluster already offloads historical log segments to object storage (S3/GCS) for cost and retention reasons. A remote read replica cluster reads those same segments from the object store directly, without needing to communicate with the origin's brokers. The segments become a de-facto shared read substrate between origin and remote clusters.

Because object-storage uploads happen on a segment-close cadence (segments are written to local NVMe first, then uploaded when closed or compacted), the remote read replica lags the origin by roughly one segment interval — typically seconds. This makes remote read replica unsuitable for real-time consumers, but well suited to read-heavy archival / analytical workloads where the origin cluster shouldn't be loaded.

Distinction from MirrorMaker2

MirrorMaker2 runs a pull process between two independent clusters, consuming from the source cluster's brokers and producing to the destination cluster's brokers. This pays double-handling on both sides and loads both clusters' broker fleets.

Remote read replica avoids the broker-to-broker copy entirely — the object-storage segments are the mirror, and the remote cluster only needs to read them. The origin cluster's broker fleet is untouched. This is a substantially cheaper fan-out mechanism when tiered storage is already deployed.

Seen in

Last updated · 470 distilled / 1,213 read