PATTERN Cited by 1 source

Bulk-write + prefetch + parallel — the batch-optimization framework¶

Pattern¶

When batch ETL jobs regress after migrating from an in-engine set-based substrate (PL/SQL inside an RDBMS) to an application-layer loop against a remote database (Java / Python / Go service + MongoDB / DynamoDB / any networked store), the fix is a composite framework of four cooperating techniques, none of which is novel on its own but whose joint application is required to recover the legacy baseline:

Bulk operations at scale. Collapse per-record writes into native batch primitives (bulkWrite, multi-collection transactions, COPY streams, IN-clause fan-ins). One server round-trip per batch, not per record.
Intelligent prefetching. Reference data (lookup tables, rate cards, rule configs) is loaded once per batch into in-process memory-friendly structures. Each record's transformation then does an in-memory map lookup rather than a DB query.
Parallel processing. Partition workloads across threads or event processors (the LMAX Disruptor's ring-buffer concurrency model is one option). Overlaps CPU-bound transform with I/O-bound fetch and write stages.
Configurable batch sizes. Batch chunk size is not a constant. Too-large batches cause memory pressure and hit server-side transaction-size / op-count limits; too-small batches reverse the round-trip win. The framework exposes this as a per-job tunable.

Motivation¶

The underlying force is concepts/network-round-trip-cost. A per-record loop between an application process and a remote database pays N × RTT regardless of database speed — a per-record 1 ms RTT caps throughput at ~1,000 records/sec before any CPU work. Legacy PL/SQL didn't pay this because the logic and the data lived inside the same process. Post-migration code that preserves the per-record shape inherits the round-trip bill.

MongoDB's 2025-09-18 retrospective reports customers seeing 25–30× batch-job slowdown after like-for-like PL/SQL → Java + MongoDB Atlas migrations, with the four regression classes named explicitly:

high network round-trips between application and database
inefficient per-record operations replacing set-based logic
under-utilization of database bulk capabilities
application-layer computation overhead when transforming large datasets

Applying the four techniques brought the same jobs on par with the legacy RDBMS, and in several cases 10–15× faster (Source: sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck).

Architecture shape (MongoDB's published pipeline)¶

Trigger (cron / user action)
    │
    ▼
Spring Boot controller
    │  fetch input records + split into batches
    ▼
Executor framework
    │  partition batches across threads / event processors
    ▼
ETL task (per batch)
    │  1. prefetch reference data → in-memory cache
    │  2. transform records in parallel
    │  3. bulkWrite back to MongoDB
    │     (optionally a multi-collection transaction in MongoDB 8)
    ▼
Completion + write-back

Every layer is off-the-shelf: Spring Boot for the controller, a generic executor framework for parallelism, MongoDB's native bulkWrite / multi-collection transactions for the batch write. The framework's contribution is organizing these four techniques as one composable whole with pluggable transformation modules.

Tuning dimensions¶

MongoDB lists five knobs that any workload needs to set explicitly:

Knob	Too low	Too high
Batch size	Round-trip win eroded, throughput flat	Memory pressure; hits MongoDB transaction limits (document size, total op count)
Thread-pool size	Under-utilized CPU; I/O stalls	Overloads the DB or network; tail latency spikes
Prefetch scope	Cache misses → per-record DB lookups	Heap pressure; large in-memory hashes
Index strategy	`bulkWrite` slows to per-op server-side cost	Write amplification; maintenance overhead
Transaction boundaries	Retry thrash on conflict	Long transactions block other writers

The framework is explicit that "it's not one size fits all" — each workload is a separate tuning exercise.

When this pattern fits¶

Post-migration batch regressions, especially PL/SQL → application layer + MongoDB / DynamoDB / Cassandra / any remote document store.
Multi-collection ETL where records need to be written to several collections atomically (benefits from MongoDB 8's multi-collection bulk transactions).
Reference-heavy transforms (insurance rate tables, tax brackets, routing policies, pricing tiers) where the same lookup table is consulted thousands of times per batch.
Monthly / nightly ETL jobs where total throughput matters more than per-record latency.

When this pattern is not the right answer¶

Aggregation-heavy pipelines over billions of records. Canva's 2024-04-29 Creators-payment pipeline faced the same per-record RTT trap and chose the sibling architectural answer: move the transform into the warehouse (Snowflake + DBT). See concepts/elt-vs-etl / patterns/end-to-end-recompute / patterns/warehouse-unload-bridge. ELT eliminates application-layer round-trips entirely; bulk-write + prefetch + parallel reduces them. If the workload is mostly aggregation, ELT wins; if it's per-record enrichment with side effects, the bulk-write framework wins.
Streaming / event-driven workloads where records arrive one at a time with tight latency budgets. Batching trades latency for throughput; stream processors need different primitives.
Small batch sizes. If the batch is only hundreds of records, a single non-batched query often beats the framework-setup overhead.

patterns/end-to-end-recompute — warehouse-side answer to the same per-record-RTT trap.
patterns/warehouse-unload-bridge — serving-side bridge from OLAP back to OLTP after ELT.
concepts/elt-vs-etl — the architectural choice: bulk-write in the app vs. push-into-warehouse.
concepts/network-round-trip-cost — the force this pattern counters.

Seen in¶

sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck — canonical wiki instance. Insurance customers migrating PL/SQL cores to Java + MongoDB Atlas; 25–30× regression on like-for-like migration; framework restores parity and in some cases beats the legacy baseline 10–15×. Framework-overview depth — no per-customer numbers or benchmark methodology disclosed.