Skip to content

CONCEPT Cited by 1 source

Cross-cluster offset translation map

Definition

A cross-cluster offset translation map is an external per-consumer-group data structure that records, for every partition replicated between two Kafka-API clusters, the correspondence between the source offset and the destination offset of the same record. Consumers that fail over from source to destination consult the map at failover time to find the destination offset equivalent to the source offset they last committed, and resume from there without gap or duplication.

Introduced on the wiki from Redpanda Migrator's 2024-12-03 (24.3-release) offset translation feature:

"Redpanda Migrator now supports offset translation for consumer applications that need to switch between reading data from the source cluster and the target cluster without losing their place in the stream."

(Source: sources/2024-12-03-redpanda-redpanda-243-extends-lakehouses-with-streaming-data-cdc)

Why the map exists — the underlying problem

Cross-cluster asynchronous replication substrates that rewrite records into the destination cluster (MirrorMaker 2, Redpanda Migrator — both Kafka-Connect-based in 2024) cannot guarantee offset identity across clusters. The destination cluster's offset for a record depends on the destination partition's own write sequence, which may differ from the source's because of differing partition topology, replication ordering, or previous destination writes.

If a consumer group committed offset N on the source and fails over to the destination, offset N on the destination is a different record (or doesn't exist yet). The consumer would either skip records or replay records it already processed.

The translation map records, per partition:

(consumer_group, topic, partition, source_offset) → destination_offset

Failover consumers consult the map to find the destination offset corresponding to their last-committed source offset and resume from there.

Contrast with offset-preserving replication

The 2025-11-06 Redpanda Shadowing feature takes the opposite architectural approach:

Axis Translation map (Migrator, 2024-12) Offset preservation (Shadowing, 2025-11)
Offset identity across clusters Different Identical (byte-for-byte)
Failover mechanism Lookup in external map Resume at same offset
Replication substrate Kafka Connect-based Broker-internal
Critical-path dependency on failover Translation map must be in sync + queryable None
Extra subsystem to operate Map storage + refresh pipeline None

See concepts/offset-preserving-replication for the canonical description of the preservation approach. The translation map approach is cheaper to bolt onto an existing Kafka Connect-based replication stream; the preservation approach requires broker-internal replication, which in turn requires both clusters to be on the same broker implementation.

Operational properties

  • Per-consumer-group state. The map is indexed by consumer group; different groups can fail over independently and each has its own map entries.
  • Must be kept in sync with replication. If the map falls behind replication, consumers fail over to stale offsets. Refresh cadence + failure modes are substrate-specific.
  • Failover-only critical path. Steady-state production / consumption doesn't consult the map; only consumer-group failover does. An unavailable map blocks failover but doesn't affect live traffic.
  • Destructive re-partition compatibility is hard. If the destination cluster re-partitions the topic (different partition count or keyed partitioning), the map has to cover record-level correspondence, not just offset correspondence. Redpanda Migrator's offset-translation announcement does not walk this case.

Seen in

Last updated · 550 distilled / 1,221 read