PATTERN Cited by 1 source
Batch-then-real-time fallback¶
Intent¶
Present a single CSV/Parquet-in, CSV/Parquet-out batch interface to callers regardless of whether the underlying LLM provider offers a native batch API. When a provider has a batch API, use it (for the ~50% cost discount). When a provider is real-time-only, wrap its real-time API behind the same interface with automatic parallelisation + exponential backoff + intelligent retry — so callers never have to know or care.
The pattern is a platform-layer masking of provider-capability heterogeneity. Provider gets a batch API later? Switch at the platform layer; callers don't change.
When to use¶
- Running a shared LLM batch processing service that routes to multiple providers.
- Internal clients want provider choice (different models / vendors for different workloads) but should not have to pick based on "does this vendor support batch?"
- Small batches are common enough that real-time parallelisation is actually faster than batch APIs for iterating on ops tasks.
Mechanics¶
The canonical realisation (Maple at Instacart, sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple|2025-08-27):
- Single caller interface — CSV or Parquet file plus prompt template. No mode flag; no batch-vs-real-time option surfaced upward.
- Provider-capability table internal to the service — each provider is tagged with its batch-API availability.
- Batch-capable providers — Maple's usual pipeline (encode → upload → poll → download → merge).
- Real-time-only providers — Maple's real-time wrapper runs:
- Automatic parallelisation across concurrent requests (bounded by provider rate limit).
- Exponential backoff on rate-limited responses.
- Intelligent retry policies.
- Failure tracking.
- Seamless upgrade path — if a provider later ships a batch API, Maple switches the routing at the platform layer; "users don't need to do anything."
- Small-batch latency benefit — real-time routing "made small batches complete more quickly, which is important for ops-related tasks when they are iterating on a problem."
From the post:
"Teams no longer need to write custom scripts or pipelines to handle bulk real-time calls. Instead, they can use the same Maple interface, and the underlying platform will handle the complexities of interacting with real-time APIs at scale." (Source: sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple)
Why this pattern¶
The alternatives, all worse:
- Force callers to pick based on batch availability — couples application choice to operational detail; forces caller code change every time a provider ships / drops batch.
- Only support batch-capable providers — narrows the provider roster; blocks teams from using best-performing models for their workload if those models happen to be real-time-only.
- Offer two separate interfaces (batch + real-time) — every caller writes two code paths; code duplication of the sort this pattern is supposed to prevent.
The pattern preserves unified- interface semantics at the batch-processing layer — sibling of patterns/unified-inference-binding (Cloudflare Workers AI at SDK level) and patterns/automatic-provider-failover (Cloudflare AI Gateway at request level). Each operates at a different layer:
- Unified-inference-binding — one SDK surface, model string
selects provider (
@cf/meta/llama-3.1-8b-instruct→ model string IS the provider selector). - Automatic-provider-failover — gateway reroutes on upstream failure across providers that share a model.
- Batch-then-real-time-fallback — gateway reroutes based on provider capability (does it support batch?), not availability.
Contrast with "batch discount when possible"¶
A weaker pattern — "try batch; fall back to real-time if batch fails or times out" — exists but is different:
- Weaker pattern: the caller is still aware of both modes; fallback is an error-handling clause.
- This pattern: the caller sees one interface; the mode choice is a provider-capability-based routing decision at platform-layer, invisible upward.
Caveats¶
- Real-time wrapper costs more than batch — you pay full real-time rates to deliver the batch-shaped interface when the provider doesn't support batch. Platform choice to absorb that cost for interface uniformity.
- Real-time rate limits are lower than batch throughput; a 10M-prompt job via real-time-only wrapping will be far slower than via batch (minutes per 1K prompts on the real-time path vs 50K prompts per batch at provider throughput on the batch path). The "small batches complete more quickly" property is genuine only for small jobs.
- Provider rate limits are the real-time wrapper's scaling ceiling; the platform ends up owning the back-off / parallelism strategy.
Seen in¶
- sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple — canonical wiki instance. Maple extends the same CSV/Parquet interface to real-time-only providers with auto-parallelisation
- exponential backoff; future batch-API availability at a provider becomes a platform-level config change.
Related¶
- patterns/llm-batch-processing-service — the parent platform pattern.
- patterns/ai-gateway-provider-abstraction — provider-routing tier this composes with.
- patterns/automatic-provider-failover — sibling routing-at- platform-layer pattern (different axis: availability, not capability).
- patterns/unified-inference-binding — SDK-level sibling.
- concepts/llm-batch-api — the provider API surface this pattern masks.
- concepts/unified-parameter-protocol — same architectural stance, parameter-level (PIXEL) vs capability-level (Maple).
- systems/maple-instacart — canonical system.
- companies/instacart — operator.