PATTERN Cited by 1 source
LLM batch processing service¶
Intent¶
Consolidate bulk LLM inference workloads — jobs of millions of prompts run offline against an LLM provider's batch API — into a single internal service that exposes a file-in / file-out RPC, hiding the provider's batch-API workflow (encode → upload → poll → download → parse → retry-failed-in-new-batch) and letting internal teams ship new LLM-driven pipelines without becoming LLM infrastructure experts.
The pattern replaces the common prior approach — every team writes its own Python script that calls the batch API — with a shared platform that owns the reliability story, the failure-handling policy, the storage shape, and the cost accounting.
When to use¶
- Organisation is running ≥10M-prompt LLM jobs (or aggregate across teams) offline — catalog cleaning, attribute extraction, ML training data gen, ranking-model training, classification at scale.
- Cost matters — batch APIs give ~50% vs real-time; at 10M- prompt scale that's hundreds of thousands of dollars.
- Multiple internal teams with similar-shape workloads are each writing their own batch-API-plumbing code — duplication signal.
- Workloads are heterogeneous in provider choice — different teams prefer different models / providers for their use case; the service can mask provider-specific batch API quirks.
When not to use¶
- Workload is entirely real-time / interactive — batch APIs don't help (latency is wrong).
- Single team with a single workload — pay the platform cost on day 2, not day 1.
- Prompts small enough that the 50K-batch ceiling never bites (< 100K prompts per job, no team duplication).
Mechanics¶
The canonical realisation (Maple at Instacart, sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple|2025-08-27):
- Single RPC API — caller submits a CSV or Parquet file plus a prompt template; gets back a job ID.
- Split + encode — input is streamed from S3 (or similar blob store), split into batches respecting the provider's 50K-prompt / 200 MB limit, encoded into the provider's batch file format. Intermediate batches stored as Parquet for compression + random-access reads.
- Submit + poll — each batch uploaded to the provider through the AI Gateway; polled for completion (hours to 24h).
- Download + merge — per-batch results downloaded as they complete, matched to input rows by task ID, written as per-batch result Parquets. All per-batch results merged into a single output file mirroring the input format.
- Per-class retry — task-level failures classified by concepts/provider-failure-taxonomy and retried per patterns/infinite-retry-by-failure-class.
- Durable workflow substrate — entire pipeline runs as a Temporal workflow so crashes, deploys, and platform restarts don't lose work or re-spend on already-submitted batches (concepts/durable-execution).
- Stream-based processing throughout — large inputs never fully materialised in RAM (concepts/stream-based-file-processing).
- Cost tracking — every provider call routed through the AI Gateway, which logs per-team cost attribution.
Extensions¶
- patterns/batch-then-real-time-fallback — wrap real-time-only providers behind the same CSV interface with auto-parallelisation
- exponential backoff; when the provider eventually ships a batch API, switch at the platform layer without user-visible change.
- Prompt-template library — share few-shot exemplars across teams (patterns/prompt-template-library).
- Concurrent batch submission — within a single job, submit multiple batches in parallel to cut end-to-end completion time (bounded by provider's concurrent-batch limit).
Outcomes (reported by Maple)¶
- Scale: 10M+ prompt jobs routinely handled.
- Cost: "Many processes have been reduced from hundreds of thousands of dollars per year to just thousands of dollars per year."
- Platform leverage: Catalog, Fulfillment, Search, and ML-training teams all share one service.
Contrast¶
Related to and distinct from:
- patterns/llm-attribute-extraction-platform (Instacart PARSE) — extraction-specific platform above LLM inference. Plausibly a caller of Maple when extraction scales to full-catalog jobs.
- patterns/unified-image-generation-platform (Instacart PIXEL) — image-gen-specific platform with unified-parameter-protocol + VLM-evaluator quality gate. Same company, same consolidation stance, different modality.
- patterns/centralized-embedding-platform (Expedia) — embedding- specific platform with similar "stop every team from DIY'ing this" framing at a different layer of the ML stack.
- patterns/ai-gateway-provider-abstraction (Cloudflare AI Gateway, Databricks Unity AI Gateway) — the tier below an LLM batch processing service. AI Gateway handles provider routing, key injection, cost tracking. LLM batch service handles workflow orchestration, retry policy, file I/O.
The pattern stack, top to bottom:
Caller team (Catalog, Fulfillment, etc.)
│
▼
LLM batch processing service (Maple) ← this pattern
│ CSV/Parquet in, CSV/Parquet out
▼
AI Gateway (provider abstraction) ← patterns/ai-gateway-provider-abstraction
│ unified endpoint, key injection, cost tracking
▼
External LLM provider (OpenAI, Anthropic, etc.)
│ native batch API (50K/200MB/24h)
Seen in¶
- sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple — canonical wiki instance. Maple at Instacart, backed by Temporal
- S3-Parquet + PyArrow + orjson, proxying through Instacart's AI Gateway + Cost Tracker.
Related¶
- patterns/ai-gateway-provider-abstraction — tier below.
- patterns/batch-then-real-time-fallback — unified-interface extension.
- patterns/infinite-retry-by-failure-class — retry-policy primitive.
- patterns/csv-in-parquet-intermediate-output-merge — storage shape.
- patterns/llm-attribute-extraction-platform — application- specific sibling.
- patterns/unified-image-generation-platform — image-gen sibling.
- patterns/centralized-embedding-platform — embedding sibling.
- concepts/llm-batch-api — the provider API this pattern abstracts.
- concepts/durable-execution — Temporal's motivating property.
- concepts/provider-failure-taxonomy — retry-class framework.
- concepts/stream-based-file-processing — memory-safety primitive.
- concepts/cost-tracking-per-team — governance primitive.
- concepts/model-agnostic-ml-platform — architectural stance.
- systems/maple-instacart — canonical system.
- systems/temporal — substrate.
- systems/aws-s3 — storage.
- systems/apache-parquet — intermediate format.
- companies/instacart — operator.