PATTERN Cited by 1 source
Bulk-write + prefetch + parallel — the batch-optimization framework¶
Pattern¶
When batch ETL jobs regress after migrating from an in-engine set-based substrate (PL/SQL inside an RDBMS) to an application-layer loop against a remote database (Java / Python / Go service + MongoDB / DynamoDB / any networked store), the fix is a composite framework of four cooperating techniques, none of which is novel on its own but whose joint application is required to recover the legacy baseline:
- Bulk operations at scale. Collapse per-record writes into native
batch primitives (
bulkWrite, multi-collection transactions, COPY streams,IN-clause fan-ins). One server round-trip per batch, not per record. - Intelligent prefetching. Reference data (lookup tables, rate cards, rule configs) is loaded once per batch into in-process memory-friendly structures. Each record's transformation then does an in-memory map lookup rather than a DB query.
- Parallel processing. Partition workloads across threads or event processors (the LMAX Disruptor's ring-buffer concurrency model is one option). Overlaps CPU-bound transform with I/O-bound fetch and write stages.
- Configurable batch sizes. Batch chunk size is not a constant. Too-large batches cause memory pressure and hit server-side transaction-size / op-count limits; too-small batches reverse the round-trip win. The framework exposes this as a per-job tunable.
Motivation¶
The underlying force is concepts/network-round-trip-cost. A
per-record loop between an application process and a remote database
pays N × RTT regardless of database speed — a per-record 1 ms RTT
caps throughput at ~1,000 records/sec before any CPU work. Legacy
PL/SQL didn't pay this because the logic and the data lived inside
the same process. Post-migration code that preserves the per-record
shape inherits the round-trip bill.
MongoDB's 2025-09-18 retrospective reports customers seeing 25–30× batch-job slowdown after like-for-like PL/SQL → Java + MongoDB Atlas migrations, with the four regression classes named explicitly:
- high network round-trips between application and database
- inefficient per-record operations replacing set-based logic
- under-utilization of database bulk capabilities
- application-layer computation overhead when transforming large datasets
Applying the four techniques brought the same jobs on par with the legacy RDBMS, and in several cases 10–15× faster (Source: sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck).
Architecture shape (MongoDB's published pipeline)¶
Trigger (cron / user action)
│
▼
Spring Boot controller
│ fetch input records + split into batches
▼
Executor framework
│ partition batches across threads / event processors
▼
ETL task (per batch)
│ 1. prefetch reference data → in-memory cache
│ 2. transform records in parallel
│ 3. bulkWrite back to MongoDB
│ (optionally a multi-collection transaction in MongoDB 8)
▼
Completion + write-back
Every layer is off-the-shelf: Spring Boot for the controller, a generic
executor framework for parallelism, MongoDB's native bulkWrite /
multi-collection transactions for the batch write. The framework's
contribution is organizing these four techniques as one composable
whole with pluggable transformation modules.
Tuning dimensions¶
MongoDB lists five knobs that any workload needs to set explicitly:
| Knob | Too low | Too high |
|---|---|---|
| Batch size | Round-trip win eroded, throughput flat | Memory pressure; hits MongoDB transaction limits (document size, total op count) |
| Thread-pool size | Under-utilized CPU; I/O stalls | Overloads the DB or network; tail latency spikes |
| Prefetch scope | Cache misses → per-record DB lookups | Heap pressure; large in-memory hashes |
| Index strategy | bulkWrite slows to per-op server-side cost |
Write amplification; maintenance overhead |
| Transaction boundaries | Retry thrash on conflict | Long transactions block other writers |
The framework is explicit that "it's not one size fits all" — each workload is a separate tuning exercise.
When this pattern fits¶
- Post-migration batch regressions, especially PL/SQL → application layer + MongoDB / DynamoDB / Cassandra / any remote document store.
- Multi-collection ETL where records need to be written to several collections atomically (benefits from MongoDB 8's multi-collection bulk transactions).
- Reference-heavy transforms (insurance rate tables, tax brackets, routing policies, pricing tiers) where the same lookup table is consulted thousands of times per batch.
- Monthly / nightly ETL jobs where total throughput matters more than per-record latency.
When this pattern is not the right answer¶
- Aggregation-heavy pipelines over billions of records. Canva's 2024-04-29 Creators-payment pipeline faced the same per-record RTT trap and chose the sibling architectural answer: move the transform into the warehouse (Snowflake + DBT). See concepts/elt-vs-etl / patterns/end-to-end-recompute / patterns/warehouse-unload-bridge. ELT eliminates application-layer round-trips entirely; bulk-write + prefetch + parallel reduces them. If the workload is mostly aggregation, ELT wins; if it's per-record enrichment with side effects, the bulk-write framework wins.
- Streaming / event-driven workloads where records arrive one at a time with tight latency budgets. Batching trades latency for throughput; stream processors need different primitives.
- Small batch sizes. If the batch is only hundreds of records, a single non-batched query often beats the framework-setup overhead.
Related patterns¶
- patterns/end-to-end-recompute — warehouse-side answer to the same per-record-RTT trap.
- patterns/warehouse-unload-bridge — serving-side bridge from OLAP back to OLTP after ELT.
- concepts/elt-vs-etl — the architectural choice: bulk-write in the app vs. push-into-warehouse.
- concepts/network-round-trip-cost — the force this pattern counters.
Seen in¶
- sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck — canonical wiki instance. Insurance customers migrating PL/SQL cores to Java + MongoDB Atlas; 25–30× regression on like-for-like migration; framework restores parity and in some cases beats the legacy baseline 10–15×. Framework-overview depth — no per-customer numbers or benchmark methodology disclosed.