Skip to content

CONCEPT Cited by 2 sources

Network round-trip cost

The round-trip-time (RTT) floor between an application process and a remote database or RPC service is the unit cost that dominates batch-job throughput whenever a loop does one operation per record. Even with 0.5–2 ms intra-AZ RTT, a sequential loop over N records pays at least N × RTT, bounded below by the speed of light, regardless of how fast the database itself executes each op.

This is the primary force behind bulk operations, push-transform-into-the-warehouse (ELT), pipelining, connection multiplexing, and the general design instinct of "every trip to the DB should carry as much work as it can."

The arithmetic

  • Per-record loop: throughput ≤ 1 / RTT. At 1 ms RTT, that's 1,000 records/sec before any CPU work. Adding more CPU or more database capacity doesn't help — the wire is the bottleneck.
  • Batched loop at batch size C: throughput ≤ C / RTT. One round trip amortizes C records; tenfold increase in C is a tenfold throughput gain for free (until other limits bite: memory pressure, server-side transaction limits, tail latency of the batch).
  • Amdahl ceiling. Even after parallelization, the serial dependency chain through the application-to-database round trip sets a hard ceiling. Adding threads reduces wall-clock at the cost of concurrent connections but can't break the per-request RTT floor.

Where it shows up

  • PL/SQL → application-layer migrations. PL/SQL executes set-based operations inside the database engine — no RTT between the logic and the data. Reimplementing the same workload as a Java / Python service doing for each row: fetch, transform, write introduces N× RTTs that didn't exist in the legacy model. MongoDB's 2025-09-18 post reports 25–30× batch-job slowdown from exactly this shape (Source: sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck).
  • OLTP-style counting over billions of records. Canva's Creators payment pipeline hit the same wall in MySQL RDS: one DB round-trip per record, single-threaded sequential scan, stuck events delay everything. The fix was architectural — move to ELT in Snowflake (Source: sources/2024-04-29-canva-scaling-to-count-billions).
  • ETL / data-migration jobs. Any pipeline that fetches-transforms- writes per record is a candidate. Chunking into batches of 1k-10k typically wins two to three orders of magnitude.
  • Microservice chatter. The same arithmetic applies when one service calls another in a loop. Cap'n Web's promise pipelining (Cloudflare 2025-09-22) is the symmetric fix at the RPC layer: "chain three calls in one round trip instead of three."
  • Dashboards that issue per-widget DB queries. Classic N+1 query problem; same underlying force.

Fixes by layer

Layer Fix
Application code Batch requests, avoid per-record loops, use bulkWrite / multi-key fetches / IN clauses / executor frameworks
Client library Connection pooling, pipelining, HTTP/2 multiplexing
Protocol Promise pipelining, request coalescing, batch-aware RPC
Data model Denormalization to reduce lookup fan-out; embedded documents
Architecture Push the transform to the data (concepts/elt-vs-etl), or to the edge (stored procedures, UDFs, serverless triggers)
Infrastructure Co-locate app + DB (same AZ → sub-ms RTT); cache reference data in-process

The prefetching corollary

When a batch needs to look up reference data (rate tables, policy config, enums), the naïve shape is one lookup per record. The bulk analogue is intelligent prefetching: load the reference table once into an in-process map before the loop, then each record's lookup is in-memory. MongoDB's batch-optimization framework names this explicitly ("Reducing repeated lookups by pre-loading and caching reference data in memory-friendly structures"). Trade-off: balance memory footprint against lookup frequency — broader prefetch wins throughput but costs heap.

Tail-latency interaction

Batch operations amortize RTT for the median record but concentrate it at the batch boundary: one slow bulk write delays the whole batch's downstream. Pairing with tail-latency discipline (timeouts, retries, hedged requests) matters at scale. Over-parallelization also risks overloading the server — MongoDB's post lists thread-pool sizing as one of five required tuning dimensions.

Seen in

Last updated · 200 distilled / 1,178 read