CONCEPT Cited by 2 sources
Network round-trip cost¶
The round-trip-time (RTT) floor between an application process and a remote database or RPC service is the unit cost that dominates batch-job throughput whenever a loop does one operation per record. Even with 0.5–2 ms intra-AZ RTT, a sequential loop over N records pays at least N × RTT, bounded below by the speed of light, regardless of how fast the database itself executes each op.
This is the primary force behind bulk operations, push-transform-into-the-warehouse (ELT), pipelining, connection multiplexing, and the general design instinct of "every trip to the DB should carry as much work as it can."
The arithmetic¶
- Per-record loop:
throughput ≤ 1 / RTT. At 1 ms RTT, that's 1,000 records/sec before any CPU work. Adding more CPU or more database capacity doesn't help — the wire is the bottleneck. - Batched loop at batch size C:
throughput ≤ C / RTT. One round trip amortizes C records; tenfold increase in C is a tenfold throughput gain for free (until other limits bite: memory pressure, server-side transaction limits, tail latency of the batch). - Amdahl ceiling. Even after parallelization, the serial dependency chain through the application-to-database round trip sets a hard ceiling. Adding threads reduces wall-clock at the cost of concurrent connections but can't break the per-request RTT floor.
Where it shows up¶
- PL/SQL → application-layer migrations. PL/SQL executes set-based
operations inside the database engine — no RTT between the logic
and the data. Reimplementing the same workload as a Java / Python
service doing
for each row: fetch, transform, writeintroduces N× RTTs that didn't exist in the legacy model. MongoDB's 2025-09-18 post reports 25–30× batch-job slowdown from exactly this shape (Source: sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck). - OLTP-style counting over billions of records. Canva's Creators payment pipeline hit the same wall in MySQL RDS: one DB round-trip per record, single-threaded sequential scan, stuck events delay everything. The fix was architectural — move to ELT in Snowflake (Source: sources/2024-04-29-canva-scaling-to-count-billions).
- ETL / data-migration jobs. Any pipeline that fetches-transforms- writes per record is a candidate. Chunking into batches of 1k-10k typically wins two to three orders of magnitude.
- Microservice chatter. The same arithmetic applies when one service calls another in a loop. Cap'n Web's promise pipelining (Cloudflare 2025-09-22) is the symmetric fix at the RPC layer: "chain three calls in one round trip instead of three."
- Dashboards that issue per-widget DB queries. Classic N+1 query problem; same underlying force.
Fixes by layer¶
| Layer | Fix |
|---|---|
| Application code | Batch requests, avoid per-record loops, use bulkWrite / multi-key fetches / IN clauses / executor frameworks |
| Client library | Connection pooling, pipelining, HTTP/2 multiplexing |
| Protocol | Promise pipelining, request coalescing, batch-aware RPC |
| Data model | Denormalization to reduce lookup fan-out; embedded documents |
| Architecture | Push the transform to the data (concepts/elt-vs-etl), or to the edge (stored procedures, UDFs, serverless triggers) |
| Infrastructure | Co-locate app + DB (same AZ → sub-ms RTT); cache reference data in-process |
The prefetching corollary¶
When a batch needs to look up reference data (rate tables, policy config, enums), the naïve shape is one lookup per record. The bulk analogue is intelligent prefetching: load the reference table once into an in-process map before the loop, then each record's lookup is in-memory. MongoDB's batch-optimization framework names this explicitly ("Reducing repeated lookups by pre-loading and caching reference data in memory-friendly structures"). Trade-off: balance memory footprint against lookup frequency — broader prefetch wins throughput but costs heap.
Tail-latency interaction¶
Batch operations amortize RTT for the median record but concentrate it at the batch boundary: one slow bulk write delays the whole batch's downstream. Pairing with tail-latency discipline (timeouts, retries, hedged requests) matters at scale. Over-parallelization also risks overloading the server — MongoDB's post lists thread-pool sizing as one of five required tuning dimensions.
Seen in¶
- sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck — the canonical wiki framing: PL/SQL → Java + MongoDB migration pays 25–30× batch regression from per-record RTT; framework (patterns/bulk-write-batch-optimization) collapses the round-trip cost and in some cases outperforms the legacy baseline by 10–15×.
- sources/2024-04-29-canva-scaling-to-count-billions — same force, different answer: Canva walked away from the application-layer loop entirely and moved the transform into Snowflake + DBT (concepts/elt-vs-etl).