PATTERN Cited by 1 source

LLM batch processing service¶

Intent¶

Consolidate bulk LLM inference workloads — jobs of millions of prompts run offline against an LLM provider's batch API — into a single internal service that exposes a file-in / file-out RPC, hiding the provider's batch-API workflow (encode → upload → poll → download → parse → retry-failed-in-new-batch) and letting internal teams ship new LLM-driven pipelines without becoming LLM infrastructure experts.

The pattern replaces the common prior approach — every team writes its own Python script that calls the batch API — with a shared platform that owns the reliability story, the failure-handling policy, the storage shape, and the cost accounting.

When to use¶

Organisation is running ≥10M-prompt LLM jobs (or aggregate across teams) offline — catalog cleaning, attribute extraction, ML training data gen, ranking-model training, classification at scale.
Cost matters — batch APIs give ~50% vs real-time; at 10M- prompt scale that's hundreds of thousands of dollars.
Multiple internal teams with similar-shape workloads are each writing their own batch-API-plumbing code — duplication signal.
Workloads are heterogeneous in provider choice — different teams prefer different models / providers for their use case; the service can mask provider-specific batch API quirks.

When not to use¶

Workload is entirely real-time / interactive — batch APIs don't help (latency is wrong).
Single team with a single workload — pay the platform cost on day 2, not day 1.
Prompts small enough that the 50K-batch ceiling never bites (< 100K prompts per job, no team duplication).

Mechanics¶

The canonical realisation (Maple at Instacart, sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple|2025-08-27):

Single RPC API — caller submits a CSV or Parquet file plus a prompt template; gets back a job ID.
Split + encode — input is streamed from S3 (or similar blob store), split into batches respecting the provider's 50K-prompt / 200 MB limit, encoded into the provider's batch file format. Intermediate batches stored as Parquet for compression + random-access reads.
Submit + poll — each batch uploaded to the provider through the AI Gateway; polled for completion (hours to 24h).
Download + merge — per-batch results downloaded as they complete, matched to input rows by task ID, written as per-batch result Parquets. All per-batch results merged into a single output file mirroring the input format.
Per-class retry — task-level failures classified by concepts/provider-failure-taxonomy and retried per patterns/infinite-retry-by-failure-class.
Durable workflow substrate — entire pipeline runs as a Temporal workflow so crashes, deploys, and platform restarts don't lose work or re-spend on already-submitted batches (concepts/durable-execution).
Stream-based processing throughout — large inputs never fully materialised in RAM (concepts/stream-based-file-processing).
Cost tracking — every provider call routed through the AI Gateway, which logs per-team cost attribution.

Extensions¶

patterns/batch-then-real-time-fallback — wrap real-time-only providers behind the same CSV interface with auto-parallelisation
exponential backoff; when the provider eventually ships a batch API, switch at the platform layer without user-visible change.
Prompt-template library — share few-shot exemplars across teams (patterns/prompt-template-library).
Concurrent batch submission — within a single job, submit multiple batches in parallel to cut end-to-end completion time (bounded by provider's concurrent-batch limit).

Outcomes (reported by Maple)¶

Scale: 10M+ prompt jobs routinely handled.
Cost: "Many processes have been reduced from hundreds of thousands of dollars per year to just thousands of dollars per year."
Platform leverage: Catalog, Fulfillment, Search, and ML-training teams all share one service.

Contrast¶

Related to and distinct from:

patterns/llm-attribute-extraction-platform (Instacart PARSE) — extraction-specific platform above LLM inference. Plausibly a caller of Maple when extraction scales to full-catalog jobs.
patterns/unified-image-generation-platform (Instacart PIXEL) — image-gen-specific platform with unified-parameter-protocol + VLM-evaluator quality gate. Same company, same consolidation stance, different modality.
patterns/centralized-embedding-platform (Expedia) — embedding- specific platform with similar "stop every team from DIY'ing this" framing at a different layer of the ML stack.
patterns/ai-gateway-provider-abstraction (Cloudflare AI Gateway, Databricks Unity AI Gateway) — the tier below an LLM batch processing service. AI Gateway handles provider routing, key injection, cost tracking. LLM batch service handles workflow orchestration, retry policy, file I/O.

The pattern stack, top to bottom:

Caller team (Catalog, Fulfillment, etc.)
    │
    ▼
LLM batch processing service (Maple)     ← this pattern
    │  CSV/Parquet in, CSV/Parquet out
    ▼
AI Gateway (provider abstraction)        ← patterns/ai-gateway-provider-abstraction
    │  unified endpoint, key injection, cost tracking
    ▼
External LLM provider (OpenAI, Anthropic, etc.)
    │  native batch API (50K/200MB/24h)

Seen in¶

sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple — canonical wiki instance. Maple at Instacart, backed by Temporal
S3-Parquet + PyArrow + orjson, proxying through Instacart's AI Gateway + Cost Tracker.

patterns/ai-gateway-provider-abstraction — tier below.
patterns/batch-then-real-time-fallback — unified-interface extension.
patterns/infinite-retry-by-failure-class — retry-policy primitive.
patterns/csv-in-parquet-intermediate-output-merge — storage shape.
patterns/llm-attribute-extraction-platform — application- specific sibling.
patterns/unified-image-generation-platform — image-gen sibling.
patterns/centralized-embedding-platform — embedding sibling.
concepts/llm-batch-api — the provider API this pattern abstracts.
concepts/durable-execution — Temporal's motivating property.
concepts/provider-failure-taxonomy — retry-class framework.
concepts/stream-based-file-processing — memory-safety primitive.
concepts/cost-tracking-per-team — governance primitive.
concepts/model-agnostic-ml-platform — architectural stance.
systems/maple-instacart — canonical system.
systems/temporal — substrate.
systems/aws-s3 — storage.
systems/apache-parquet — intermediate format.
companies/instacart — operator.