Skip to content

PATTERN Cited by 1 source

Bulk-write + prefetch + parallel — the batch-optimization framework

Pattern

When batch ETL jobs regress after migrating from an in-engine set-based substrate (PL/SQL inside an RDBMS) to an application-layer loop against a remote database (Java / Python / Go service + MongoDB / DynamoDB / any networked store), the fix is a composite framework of four cooperating techniques, none of which is novel on its own but whose joint application is required to recover the legacy baseline:

  1. Bulk operations at scale. Collapse per-record writes into native batch primitives (bulkWrite, multi-collection transactions, COPY streams, IN-clause fan-ins). One server round-trip per batch, not per record.
  2. Intelligent prefetching. Reference data (lookup tables, rate cards, rule configs) is loaded once per batch into in-process memory-friendly structures. Each record's transformation then does an in-memory map lookup rather than a DB query.
  3. Parallel processing. Partition workloads across threads or event processors (the LMAX Disruptor's ring-buffer concurrency model is one option). Overlaps CPU-bound transform with I/O-bound fetch and write stages.
  4. Configurable batch sizes. Batch chunk size is not a constant. Too-large batches cause memory pressure and hit server-side transaction-size / op-count limits; too-small batches reverse the round-trip win. The framework exposes this as a per-job tunable.

Motivation

The underlying force is concepts/network-round-trip-cost. A per-record loop between an application process and a remote database pays N × RTT regardless of database speed — a per-record 1 ms RTT caps throughput at ~1,000 records/sec before any CPU work. Legacy PL/SQL didn't pay this because the logic and the data lived inside the same process. Post-migration code that preserves the per-record shape inherits the round-trip bill.

MongoDB's 2025-09-18 retrospective reports customers seeing 25–30× batch-job slowdown after like-for-like PL/SQL → Java + MongoDB Atlas migrations, with the four regression classes named explicitly:

  • high network round-trips between application and database
  • inefficient per-record operations replacing set-based logic
  • under-utilization of database bulk capabilities
  • application-layer computation overhead when transforming large datasets

Applying the four techniques brought the same jobs on par with the legacy RDBMS, and in several cases 10–15× faster (Source: sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck).

Architecture shape (MongoDB's published pipeline)

Trigger (cron / user action)
Spring Boot controller
    │  fetch input records + split into batches
Executor framework
    │  partition batches across threads / event processors
ETL task (per batch)
    │  1. prefetch reference data → in-memory cache
    │  2. transform records in parallel
    │  3. bulkWrite back to MongoDB
    │     (optionally a multi-collection transaction in MongoDB 8)
Completion + write-back

Every layer is off-the-shelf: Spring Boot for the controller, a generic executor framework for parallelism, MongoDB's native bulkWrite / multi-collection transactions for the batch write. The framework's contribution is organizing these four techniques as one composable whole with pluggable transformation modules.

Tuning dimensions

MongoDB lists five knobs that any workload needs to set explicitly:

Knob Too low Too high
Batch size Round-trip win eroded, throughput flat Memory pressure; hits MongoDB transaction limits (document size, total op count)
Thread-pool size Under-utilized CPU; I/O stalls Overloads the DB or network; tail latency spikes
Prefetch scope Cache misses → per-record DB lookups Heap pressure; large in-memory hashes
Index strategy bulkWrite slows to per-op server-side cost Write amplification; maintenance overhead
Transaction boundaries Retry thrash on conflict Long transactions block other writers

The framework is explicit that "it's not one size fits all" — each workload is a separate tuning exercise.

When this pattern fits

  • Post-migration batch regressions, especially PL/SQL → application layer + MongoDB / DynamoDB / Cassandra / any remote document store.
  • Multi-collection ETL where records need to be written to several collections atomically (benefits from MongoDB 8's multi-collection bulk transactions).
  • Reference-heavy transforms (insurance rate tables, tax brackets, routing policies, pricing tiers) where the same lookup table is consulted thousands of times per batch.
  • Monthly / nightly ETL jobs where total throughput matters more than per-record latency.

When this pattern is not the right answer

  • Aggregation-heavy pipelines over billions of records. Canva's 2024-04-29 Creators-payment pipeline faced the same per-record RTT trap and chose the sibling architectural answer: move the transform into the warehouse (Snowflake + DBT). See concepts/elt-vs-etl / patterns/end-to-end-recompute / patterns/warehouse-unload-bridge. ELT eliminates application-layer round-trips entirely; bulk-write + prefetch + parallel reduces them. If the workload is mostly aggregation, ELT wins; if it's per-record enrichment with side effects, the bulk-write framework wins.
  • Streaming / event-driven workloads where records arrive one at a time with tight latency budgets. Batching trades latency for throughput; stream processors need different primitives.
  • Small batch sizes. If the batch is only hundreds of records, a single non-batched query often beats the framework-setup overhead.

Seen in

Last updated · 200 distilled / 1,178 read