CONCEPT Cited by 1 source

Latency-critical vs latency-tolerant workload¶

Definition¶

Latency-critical vs latency-tolerant workload is the workload- class distinction between streams whose business value depends on low end-to-end latency (tight p99 / p99.9 targets) and streams whose business value is unchanged by latency in the 100-ms-to-minutes range (the latter being durable / compliant / complete, not fast).

This distinction is the motivating framing for per-topic storage tiering (patterns/per-topic-storage-tier-within-one-cluster) — different workloads have different latency-vs-cost frontiers and a one-size-fits-all cluster over-pays for one or under-serves the other.

Canonical wiki source¶

Introduced in the Redpanda 25.3 launch post with an explicit categorisation:

"Some data sets are latency-critical (e.g., payments, trading, cybersecurity), and others are latency-tolerant (e.g., observability, model training, compliance reporting). Treating those workloads the same is inefficient."

"We all have this type of data — you know the kind: compliance logs, debug streams, raw events for that AI project you'll start someday. It's important, but does it really need instantaneous replication across AZ boundaries, or to live on the same screaming-fast, high-performance SSDs as your mission-critical event data? No."

Canonical workload tiers¶

The Redpanda post gives explicit examples that map to the two classes:

Class	Examples	Dominant concern
Latency-critical	Payments, trading, cybersecurity, real-time agents, fraud detection, user-facing UX pipelines	p99 latency < 100 ms
Latency-tolerant	Observability / debug streams, model-training data, compliance / audit logs, batch-analytics raw-event archives, "that AI project you'll start someday"	Durability + completeness + cost

The threshold between them is not a hard ms cutoff — it's whether a 10× increase in end-to-end latency (say, from 50 ms to 500 ms, or from 500 ms to 5 s) materially affects business outcomes:

Payments — a 500-ms-over-budget trade may miss a market window and destroy value. Latency-critical.
Compliance audit log — a record that arrives 5 s late vs in real time has the same regulatory value. Latency-tolerant.

Why this distinction is architecturally load-bearing¶

The two classes have opposite cost drivers:

Latency-critical workloads pay for predictable low latency — NVMe drives, RAM-resident hot paths, minimal cross-tier hops, aggressive read-path pre-fetching, multi-AZ replication to keep tail latency bounded on broker restart.
Latency-tolerant workloads pay for durability at low cost per GB — object storage (S3 / GCS / ADLS) at $0.02-ish/GB- month vs NVMe at $0.20+/GB-month, no cross-AZ replication cost (durability inherited from object-store service), acceptable read latency of seconds.

A cluster that runs both classes on the same substrate over-pays by 10× on the latency-tolerant data (paying NVMe + cross-AZ costs on workloads that don't need them) or under-serves the latency-critical data (using the cheap substrate means accepting object-store write latency on payments traffic).

Operational instance¶

Redpanda Cloud Topics exposes the per-topic choice directly — a latency-critical topic is a traditional NVMe-backed topic (or with write caching enabled for ultra-low-latency); a latency-tolerant topic is a Cloud Topic writing straight to object storage. Both live in the same cluster, share the same Kafka-API endpoint, same IAM, same GitOps pipeline.

This concept sits adjacent to but distinct from:

concepts/tail-latency-at-scale — the engineering discipline of driving p99.9 down. Applies within the latency-critical class.
concepts/tiered-storage-as-primary-fallback — the whole-topic hot/cold tiering shape. The per-topic-tiering shape is orthogonal (multiple topics, each with their own hot/cold profile).
concepts/stateless-compute — the operational shape object-storage-native streams take.

Historical note¶

The latency-critical vs latency-tolerant framing has long existed in the database world (OLTP vs OLAP / data warehouse), the messaging world (synchronous RPC vs event log), and the object- storage world (Standard vs Glacier). Redpanda's 25.3 framing is the first wiki instance of applying it per-topic within a single streaming cluster, making the trade-off a topic-level configuration choice rather than a cluster-level commitment.

Seen in¶

sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more — canonical wiki source introducing the latency-critical vs latency-tolerant classification as the motivating framing for Cloud Topics' per-topic storage tiering. Verbatim workload examples cover payments / trading / cybersecurity (latency-critical) and observability / model training / compliance reporting (latency-tolerant).

systems/redpanda — the broker instantiating per-topic tiering on this distinction.
systems/redpanda-cloud-topics — the 25.3 feature that operationalises the distinction.
concepts/cross-az-replication-bandwidth-cost — the cost axis the latency-tolerant class sidesteps.
concepts/tiered-storage-as-primary-fallback — the whole-topic tiering shape this distinction extends to per-topic.
concepts/broker-write-caching — the ultra-low-latency axis for the latency-critical class.
concepts/tail-latency-at-scale — the engineering discipline inside the latency-critical class.
patterns/per-topic-storage-tier-within-one-cluster — the Redpanda 25.3 pattern that exploits this distinction.
patterns/tiered-storage-to-object-store — the broader tiering family.