SYSTEM Cited by 1 source
Maple (Instacart batch LLM service)¶
Maple is Instacart's internal LLM batch-processing service. It accepts a CSV or Parquet file plus a prompt template and returns an output file with the input rows merged with the LLM response — at scales up to 10M+ prompts per job. It hides the LLM provider's batch API (50K-prompt / 200 MB per-batch ceiling, 24 h SLA, multi-step encode → upload → poll → download → parse → retry workflow) behind a single RPC. Reported savings: ~50% vs real-time LLM calls on compatible workloads.
(Source: Maple post)
What Maple automates¶
- Batching — splits large input files into ≤ 50K-prompt / ≤ 200 MB batches.
- Encoding/decoding — converts to and from the LLM provider's batch file format.
- File management — uploads, status polling, result downloads.
- Retries — per-failure-class retry policy (see below).
- Cost tracking — detailed per-team usage accounting (via systems/instacart-cost-tracker).
- Merging — per-batch result Parquets fan back into a single output file mirroring the input format.
Implementation stack¶
- Language: Python.
- Durable execution: Temporal — every activity (encode, upload, poll, download, decode, merge) is a Temporal activity; the overall job is a Temporal workflow. Canonical instance of concepts/durable-execution: "Even if exceptions occur, Temporal's fault tolerance safeguards data integrity and guarantees job completion." Protects against data loss and avoids paying for partially completed jobs (LLM batches are paid on submit, not on completion).
- RPC API: streamlined interface for submitting jobs + tracking progress.
- Storage: S3 for inputs / intermediate batches / outputs. "Avoiding costly database operations. This approach is not only cheaper but also allows handling large datasets."
- File format: Parquet for intermediate storage — claimed up to 25× smaller than CSV (per-column compression) + random-access reads into the file used at merge time.
- File I/O: PyArrow for efficient columnar processing.
- JSON parsing: orjson replaced
Python stdlib
json— "faster and more memory-efficient alternative." - Memory discipline: stream-based processing end-to-end (concepts/stream-based-file-processing).
Pipeline shape¶
CSV/Parquet in ──┐
│ split + encode per batch
▼
batch-N Parquet (≤ 50K prompts)
│ upload → provider batch API
│ poll for completion (hours to 24h)
│ download results
▼
per-batch result Parquet
│ match responses to inputs by task ID
▼
merged output file (CSV or Parquet)
All intermediate artifacts land on S3 — Maple never holds full-job data in memory.
Where Maple fits in Instacart's AI stack¶
"Maple sits at the center of Instacart's large-scale LLM processing pipeline, serving as the orchestration layer between internal teams and external model providers." Routing:
- Internal client (Catalog / Fulfillment / Search teams) calls Maple RPC with a CSV/Parquet + prompt template.
- Maple handles batching, workflow, retries, result merging.
- AI Gateway — Maple proxies all prompts through this internal service; it is the provider-abstraction + cost-tracking layer.
- Cost Tracker logs detailed per-job + per-team spend via the AI Gateway.
- External LLM provider actually runs the inference.
This is classical patterns/ai-gateway-provider-abstraction: Maple sits above a unified AI-Gateway tier, each concern at the layer it's easiest to build once.
Failure handling¶
Task-level failures get class-specific retry policy — patterns/infinite-retry-by-failure-class:
| Failure class | Default retry |
|---|---|
| Expired (provider doesn't return in 24 h) | Infinite (new batch) |
| Rate-limited (provider token limit hit) | Infinite |
| Refused (bad params, content-filtered prompt/image) | Max 2× |
| Invalid image (URL dead or unreachable) | Optional — on retry, check image exists before resubmitting |
"We don't do this [image existence check] the first time around because checking each image in a large batch can add significant overhead." Canonical instance of concepts/provider-failure-taxonomy.
Extension: real-time-fallback for non-batch providers¶
Not all LLM providers support batch APIs. Maple extends the same CSV/Parquet interface to real-time-only providers with:
- Automatic parallelisation across concurrent requests.
- Exponential backoff on rate-limited responses.
- Intelligent retry policies.
- Failure tracking.
Benefits:
- Teams use the same interface regardless of which provider handles the call.
- "If a provider starts offering a batch interface, we can switch it over seamlessly without our users needing to do anything."
- Small batches complete faster under real-time routing — "important for ops-related tasks when they are iterating on a problem."
Canonical patterns/batch-then-real-time-fallback.
Production batch performance¶
From a ~580-batch sample at 40–50K tasks/batch:
- Mean throughput: ~2.6 prompts/sec per batch.
- Distribution: clustered 1–4 prompts/sec.
- Completion time: most batches complete in < 12 h; some approach the 24 h SLA.
- Job-size vs completion-time: positive slope (larger jobs take longer, log-y-axis scatter plot).
Prompt image content noted as a latency factor — image-including prompts are common at Instacart and slower to process.
Scale optimisations (forced by 10M+ prompt jobs)¶
As input files grew, Maple hit storage, memory, and processing limits. Named fixes:
- DB → S3 Parquet for task-data storage (load/save speed + cost).
- Stream-based processing to bound memory on large files.
- orjson instead of Python
jsonstdlib.
"These optimizations allowed Maple to scale efficiently to 10M+ prompt jobs."
Outcomes¶
- Scale: routinely handles 10M+ prompt jobs.
- Cost: "Many processes have been reduced from hundreds of thousands of dollars per year to just thousands of dollars per year."
- Savings ceiling: "saving up to 50% on LLM costs compared to standard real-time calls."
- Adoption: Catalog, Fulfillment, Search, and ML-training teams use Maple as a shared universal-bulk-LLM primitive.
Context within Instacart's platform cluster¶
Maple is the batch-inference leg of Instacart's platform- consolidation story, alongside:
- PIXEL — unified image-generation platform (2025-07-17 post).
- PARSE — multi-modal LLM attribute- extraction platform (2025-08-01 post). PARSE is a plausible upstream caller of Maple when extraction runs at full-catalog scale.
All three share the concepts/model-agnostic-ml-platform architectural stance — one internal platform fronting multiple providers / models; callers don't pick or manage the model. Maple specialises that stance to the batch dimension.
Caveats¶
- Specific LLM provider is not named (50K-prompt / 200 MB / 24 h SLA is consistent with OpenAI's Batch API; Anthropic Message Batches has similar constraints).
- AI Gateway internals are not disclosed (fleet shape, caching, semantic caching, provider fallback, region strategy).
- Maple's own scaling model (worker count, Temporal worker fleet, horizontal scale knobs) not disclosed.
- Concurrency across batches (serial vs parallel submission strategy) not disclosed.
- Error-rate breakdown across the four failure classes not quantified.
- Temporal failure modes (what happens if Temporal is unavailable) not addressed.
Seen in¶
- sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple — canonical introduction.
Related¶
- systems/instacart-ai-gateway — the provider-abstraction tier Maple sits above.
- systems/instacart-cost-tracker — per-team usage accounting.
- systems/temporal — durable-execution substrate.
- systems/aws-s3 — intermediate + final storage.
- systems/apache-parquet — intermediate file format.
- systems/instacart-pixel — image-generation sibling platform.
- systems/instacart-parse — attribute-extraction sibling platform.
- concepts/durable-execution — Temporal's motivating property.
- concepts/llm-batch-api — the provider API Maple abstracts.
- concepts/provider-failure-taxonomy — the four-class retry model.
- concepts/stream-based-file-processing — memory-bound input handling.
- concepts/cost-tracking-per-team — AI-Gateway-level accounting.
- patterns/llm-batch-processing-service — the pattern Maple canonicalises.
- patterns/batch-then-real-time-fallback — unified-interface across provider capability.
- patterns/infinite-retry-by-failure-class — class-dependent retry.
- patterns/ai-gateway-provider-abstraction — the tier below.
- companies/instacart — operator.