CONCEPT Cited by 2 sources

Network round-trip cost¶

The round-trip-time (RTT) floor between an application process and a remote database or RPC service is the unit cost that dominates batch-job throughput whenever a loop does one operation per record. Even with 0.5–2 ms intra-AZ RTT, a sequential loop over N records pays at least N × RTT, bounded below by the speed of light, regardless of how fast the database itself executes each op.

This is the primary force behind bulk operations, push-transform-into-the-warehouse (ELT), pipelining, connection multiplexing, and the general design instinct of "every trip to the DB should carry as much work as it can."

The arithmetic¶

Per-record loop: throughput ≤ 1 / RTT. At 1 ms RTT, that's 1,000 records/sec before any CPU work. Adding more CPU or more database capacity doesn't help — the wire is the bottleneck.
Batched loop at batch size C: throughput ≤ C / RTT. One round trip amortizes C records; tenfold increase in C is a tenfold throughput gain for free (until other limits bite: memory pressure, server-side transaction limits, tail latency of the batch).
Amdahl ceiling. Even after parallelization, the serial dependency chain through the application-to-database round trip sets a hard ceiling. Adding threads reduces wall-clock at the cost of concurrent connections but can't break the per-request RTT floor.

Where it shows up¶

PL/SQL → application-layer migrations. PL/SQL executes set-based operations inside the database engine — no RTT between the logic and the data. Reimplementing the same workload as a Java / Python service doing for each row: fetch, transform, write introduces N× RTTs that didn't exist in the legacy model. MongoDB's 2025-09-18 post reports 25–30× batch-job slowdown from exactly this shape (Source: sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck).
OLTP-style counting over billions of records. Canva's Creators payment pipeline hit the same wall in MySQL RDS: one DB round-trip per record, single-threaded sequential scan, stuck events delay everything. The fix was architectural — move to ELT in Snowflake (Source: sources/2024-04-29-canva-scaling-to-count-billions).
ETL / data-migration jobs. Any pipeline that fetches-transforms- writes per record is a candidate. Chunking into batches of 1k-10k typically wins two to three orders of magnitude.
Microservice chatter. The same arithmetic applies when one service calls another in a loop. Cap'n Web's promise pipelining (Cloudflare 2025-09-22) is the symmetric fix at the RPC layer: "chain three calls in one round trip instead of three."
Dashboards that issue per-widget DB queries. Classic N+1 query problem; same underlying force.

Fixes by layer¶

Layer	Fix
Application code	Batch requests, avoid per-record loops, use `bulkWrite` / multi-key fetches / `IN` clauses / executor frameworks
Client library	Connection pooling, pipelining, HTTP/2 multiplexing
Protocol	Promise pipelining, request coalescing, batch-aware RPC
Data model	Denormalization to reduce lookup fan-out; embedded documents
Architecture	Push the transform to the data (concepts/elt-vs-etl), or to the edge (stored procedures, UDFs, serverless triggers)
Infrastructure	Co-locate app + DB (same AZ → sub-ms RTT); cache reference data in-process

The prefetching corollary¶

When a batch needs to look up reference data (rate tables, policy config, enums), the naïve shape is one lookup per record. The bulk analogue is intelligent prefetching: load the reference table once into an in-process map before the loop, then each record's lookup is in-memory. MongoDB's batch-optimization framework names this explicitly ("Reducing repeated lookups by pre-loading and caching reference data in memory-friendly structures"). Trade-off: balance memory footprint against lookup frequency — broader prefetch wins throughput but costs heap.

Tail-latency interaction¶

Batch operations amortize RTT for the median record but concentrate it at the batch boundary: one slow bulk write delays the whole batch's downstream. Pairing with tail-latency discipline (timeouts, retries, hedged requests) matters at scale. Over-parallelization also risks overloading the server — MongoDB's post lists thread-pool sizing as one of five required tuning dimensions.

Seen in¶

sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck — the canonical wiki framing: PL/SQL → Java + MongoDB migration pays 25–30× batch regression from per-record RTT; framework (patterns/bulk-write-batch-optimization) collapses the round-trip cost and in some cases outperforms the legacy baseline by 10–15×.
sources/2024-04-29-canva-scaling-to-count-billions — same force, different answer: Canva walked away from the application-layer loop entirely and moved the transform into Snowflake + DBT (concepts/elt-vs-etl).