PATTERN Cited by 1 source
tc-latency injection for geo-distributed simulation¶
Pattern¶
To benchmark a multi-region cluster without the expense of actually
deploying across regions, deploy all brokers in a single AZ and
use Linux tc (traffic control, typically tc qdisc ... netem
delay) to selectively inject network latency between broker
pairs — simulating cross-region RTT on the replication-side
links while leaving other links untouched. Run the usual benchmark
(e.g., OpenMessaging
Benchmark) against the cluster and measure produce/consume
latency as if the cluster were geographically stretched.
The technique's load-bearing property is the ability to isolate inter-broker latency from client-to-broker latency, so you can study the cross-region-quorum-RTT dimension without paying real cross-region cloud bandwidth or operating a real multi-region deployment during testing.
Canonical framing — Redpanda¶
"To simulate a multi-region Redpanda cluster, we set up a 3-node Redpanda cluster with
i3en.xlargeVMs. These VMs have four cores per node with 32 MB of memory each, and simulate a Tier-2 Redpanda Cloud cluster. We usedtcto only add network latency between Redpanda brokers. No network latency was added between the OMB worker nodes and Redpanda broker nodes to simulate leader pinning."(Source: sources/2025-02-11-redpanda-high-availability-deployment-multi-region-stretch-clusters)
Why isolate broker-to-broker from client-to-broker¶
On a real multi-region stretch cluster, client-to-broker and broker-to-broker latencies are different variables:
- With leader pinning applied, the client-to-leader hop is intra-region — effectively zero relative to cross-region latency.
- The broker-to-broker replication links are cross-region — the expensive dimension.
If a simulation injects latency uniformly on every network path, it measures the wrong configuration: producers and consumers pay cross-region RTT to the leader they're talking to, as well as replication-side cross-region RTT. That's a no-leader-pinning + no-follower-fetching worst-case baseline, not the optimised shape operators actually run.
By restricting tc netem delay to inter-broker interfaces, the
simulation matches the leader-pinned deployment profile — the
configuration operators actually use in production.
Mechanism sketch¶
On Linux, tc qdisc add dev <iface> root netem delay <time>
inserts a latency queue on the egress of <iface>. To restrict
the latency to a specific peer (other broker), combine with
tc filter matching on the peer IP — or apply the qdisc on a
dedicated interface/VLAN used only for inter-broker traffic. The
Redpanda post does not publish the exact tc invocation; the
substance of the pattern is the topology decision (inter-broker
only), not the exact command.
Sample approximation of the topology:
[tc delay +δ] [tc delay +δ]
Broker A <─────────> Broker B <─────────> Broker C
↑ ↑ ↑
│ no delay │ no delay │ no delay
▼ ▼ ▼
OMB worker 1 OMB worker 2 OMB worker 3
Where δ is the simulated cross-region RTT (e.g., 30 ms for cross-
AZ, 60 ms for us-east ↔ us-west, 150 ms transoceanic).
Calibration¶
The technique's main degree of freedom is δ — the injected inter-broker delay. The Redpanda post does not publish the δ values used for the different "stretch configurations" charted in the published publish-latency graph. Practitioners calibrate δ from canonical denominators:
| Topology | Typical RTT | Reference |
|---|---|---|
| Same-AZ | < 1 ms | baseline |
| Cross-AZ in region | 1-10 ms | AWS single-digit-ms guidance |
us-east-1 ↔ us-west-1 |
60+ ms | cloudping.co |
| Transoceanic (US ↔ EU / Asia) | 100-200 ms+ | cloudping.co |
Running the benchmark at multiple δ values produces the latency-vs- stretch-configuration curve the Redpanda post references.
What the technique doesn't capture¶
- Cross-region bandwidth constraints:
tc netem delayinjects latency but not a bandwidth cap. Real cross-region links have both. Addtc tbf/tc htbfor a bandwidth cap to simulate both dimensions. - Packet loss / jitter:
tc netemsupportsloss N%andjitter Xbut the Redpanda setup only used delay. Real WAN links have tail-latency jitter and occasional loss; benchmark latency tails on pure-delay will understate real-world variability. - Asymmetric topology: the baseline 3-broker star-to-OMB topology doesn't capture asymmetric replica placement (e.g., RF=5 across 3 regions where one region has 2 replicas). Realistic multi-region deployments often have this asymmetry.
- Control-plane latency: metadata operations (rpk admin commands, leader-election RPCs) pay the same injected latency as data-plane replication. Real control planes are often separately latency-sensitive.
- Cost dimension:
tc-simulated clusters are all intra-AZ in cloud billing — the real cross-region bandwidth cost of a production deployment is invisible to the benchmark.
When to use¶
- Pre-production performance evaluation for a new region topology before committing cloud spend.
- Comparing producer config (
acks,linger.ms,batch.size) latency behaviour at different simulated stretch configurations. - Regression-testing broker changes against a standard set of cross-region latency points.
When not to use¶
- Go-live validation: real cross-region cloud networks behave differently (asymmetric bandwidth, BGP path changes, CSP network-maintenance events). Always do some real-region canary before full production rollout.
- Cost estimation:
tctells you nothing about cross-region bandwidth cost.
Composes with¶
- systems/openmessaging-benchmark: the canonical benchmark runner; the Redpanda post uses OMB on this topology.
- concepts/leader-pinning: the deployment configuration this technique simulates (broker-only latency, no client-to- broker latency).
- patterns/multi-region-raft-quorum: the quorum shape whose latency behaviour this technique characterises.
Seen in¶
- sources/2025-02-11-redpanda-high-availability-deployment-multi-region-stretch-clusters — canonical wiki introduction of the technique for multi-region Redpanda performance testing; explicit framing as a leader-pinning-equivalent topology.