Skip to content

CONCEPT Cited by 1 source

Latency-critical vs latency-tolerant workload

Definition

Latency-critical vs latency-tolerant workload is the workload- class distinction between streams whose business value depends on low end-to-end latency (tight p99 / p99.9 targets) and streams whose business value is unchanged by latency in the 100-ms-to-minutes range (the latter being durable / compliant / complete, not fast).

This distinction is the motivating framing for per-topic storage tiering (patterns/per-topic-storage-tier-within-one-cluster) — different workloads have different latency-vs-cost frontiers and a one-size-fits-all cluster over-pays for one or under-serves the other.

Canonical wiki source

Introduced in the Redpanda 25.3 launch post with an explicit categorisation:

"Some data sets are latency-critical (e.g., payments, trading, cybersecurity), and others are latency-tolerant (e.g., observability, model training, compliance reporting). Treating those workloads the same is inefficient."

"We all have this type of data — you know the kind: compliance logs, debug streams, raw events for that AI project you'll start someday. It's important, but does it really need instantaneous replication across AZ boundaries, or to live on the same screaming-fast, high-performance SSDs as your mission-critical event data? No."

Canonical workload tiers

The Redpanda post gives explicit examples that map to the two classes:

Class Examples Dominant concern
Latency-critical Payments, trading, cybersecurity, real-time agents, fraud detection, user-facing UX pipelines p99 latency < 100 ms
Latency-tolerant Observability / debug streams, model-training data, compliance / audit logs, batch-analytics raw-event archives, "that AI project you'll start someday" Durability + completeness + cost

The threshold between them is not a hard ms cutoff — it's whether a 10× increase in end-to-end latency (say, from 50 ms to 500 ms, or from 500 ms to 5 s) materially affects business outcomes:

  • Payments — a 500-ms-over-budget trade may miss a market window and destroy value. Latency-critical.
  • Compliance audit log — a record that arrives 5 s late vs in real time has the same regulatory value. Latency-tolerant.

Why this distinction is architecturally load-bearing

The two classes have opposite cost drivers:

  • Latency-critical workloads pay for predictable low latency — NVMe drives, RAM-resident hot paths, minimal cross-tier hops, aggressive read-path pre-fetching, multi-AZ replication to keep tail latency bounded on broker restart.
  • Latency-tolerant workloads pay for durability at low cost per GB — object storage (S3 / GCS / ADLS) at $0.02-ish/GB- month vs NVMe at $0.20+/GB-month, no cross-AZ replication cost (durability inherited from object-store service), acceptable read latency of seconds.

A cluster that runs both classes on the same substrate over-pays by 10× on the latency-tolerant data (paying NVMe + cross-AZ costs on workloads that don't need them) or under-serves the latency-critical data (using the cheap substrate means accepting object-store write latency on payments traffic).

Operational instance

Redpanda Cloud Topics exposes the per-topic choice directly — a latency-critical topic is a traditional NVMe-backed topic (or with write caching enabled for ultra-low-latency); a latency-tolerant topic is a Cloud Topic writing straight to object storage. Both live in the same cluster, share the same Kafka-API endpoint, same IAM, same GitOps pipeline.

This concept sits adjacent to but distinct from:

Historical note

The latency-critical vs latency-tolerant framing has long existed in the database world (OLTP vs OLAP / data warehouse), the messaging world (synchronous RPC vs event log), and the object- storage world (Standard vs Glacier). Redpanda's 25.3 framing is the first wiki instance of applying it per-topic within a single streaming cluster, making the trade-off a topic-level configuration choice rather than a cluster-level commitment.

Seen in

Last updated · 470 distilled / 1,213 read