PATTERN Cited by 1 source

Batch-then-real-time fallback¶

Intent¶

Present a single CSV/Parquet-in, CSV/Parquet-out batch interface to callers regardless of whether the underlying LLM provider offers a native batch API. When a provider has a batch API, use it (for the ~50% cost discount). When a provider is real-time-only, wrap its real-time API behind the same interface with automatic parallelisation + exponential backoff + intelligent retry — so callers never have to know or care.

The pattern is a platform-layer masking of provider-capability heterogeneity. Provider gets a batch API later? Switch at the platform layer; callers don't change.

When to use¶

Running a shared LLM batch processing service that routes to multiple providers.
Internal clients want provider choice (different models / vendors for different workloads) but should not have to pick based on "does this vendor support batch?"
Small batches are common enough that real-time parallelisation is actually faster than batch APIs for iterating on ops tasks.

Mechanics¶

The canonical realisation (Maple at Instacart, sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple|2025-08-27):

Single caller interface — CSV or Parquet file plus prompt template. No mode flag; no batch-vs-real-time option surfaced upward.
Provider-capability table internal to the service — each provider is tagged with its batch-API availability.
Batch-capable providers — Maple's usual pipeline (encode → upload → poll → download → merge).
Real-time-only providers — Maple's real-time wrapper runs:
Automatic parallelisation across concurrent requests (bounded by provider rate limit).
Exponential backoff on rate-limited responses.
Intelligent retry policies.
Failure tracking.
Seamless upgrade path — if a provider later ships a batch API, Maple switches the routing at the platform layer; "users don't need to do anything."
Small-batch latency benefit — real-time routing "made small batches complete more quickly, which is important for ops-related tasks when they are iterating on a problem."

From the post:

"Teams no longer need to write custom scripts or pipelines to handle bulk real-time calls. Instead, they can use the same Maple interface, and the underlying platform will handle the complexities of interacting with real-time APIs at scale." (Source: sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple)

Why this pattern¶

The alternatives, all worse:

Force callers to pick based on batch availability — couples application choice to operational detail; forces caller code change every time a provider ships / drops batch.
Only support batch-capable providers — narrows the provider roster; blocks teams from using best-performing models for their workload if those models happen to be real-time-only.
Offer two separate interfaces (batch + real-time) — every caller writes two code paths; code duplication of the sort this pattern is supposed to prevent.

The pattern preserves unified- interface semantics at the batch-processing layer — sibling of patterns/unified-inference-binding (Cloudflare Workers AI at SDK level) and patterns/automatic-provider-failover (Cloudflare AI Gateway at request level). Each operates at a different layer:

Unified-inference-binding — one SDK surface, model string selects provider (@cf/meta/llama-3.1-8b-instruct → model string IS the provider selector).
Automatic-provider-failover — gateway reroutes on upstream failure across providers that share a model.
Batch-then-real-time-fallback — gateway reroutes based on provider capability (does it support batch?), not availability.

Contrast with "batch discount when possible"¶

A weaker pattern — "try batch; fall back to real-time if batch fails or times out" — exists but is different:

Weaker pattern: the caller is still aware of both modes; fallback is an error-handling clause.
This pattern: the caller sees one interface; the mode choice is a provider-capability-based routing decision at platform-layer, invisible upward.

Caveats¶

Real-time wrapper costs more than batch — you pay full real-time rates to deliver the batch-shaped interface when the provider doesn't support batch. Platform choice to absorb that cost for interface uniformity.
Real-time rate limits are lower than batch throughput; a 10M-prompt job via real-time-only wrapping will be far slower than via batch (minutes per 1K prompts on the real-time path vs 50K prompts per batch at provider throughput on the batch path). The "small batches complete more quickly" property is genuine only for small jobs.
Provider rate limits are the real-time wrapper's scaling ceiling; the platform ends up owning the back-off / parallelism strategy.

Seen in¶

sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple — canonical wiki instance. Maple extends the same CSV/Parquet interface to real-time-only providers with auto-parallelisation
exponential backoff; future batch-API availability at a provider becomes a platform-level config change.

patterns/llm-batch-processing-service — the parent platform pattern.
patterns/ai-gateway-provider-abstraction — provider-routing tier this composes with.
patterns/automatic-provider-failover — sibling routing-at- platform-layer pattern (different axis: availability, not capability).
patterns/unified-inference-binding — SDK-level sibling.
concepts/llm-batch-api — the provider API surface this pattern masks.
concepts/unified-parameter-protocol — same architectural stance, parameter-level (PIXEL) vs capability-level (Maple).
systems/maple-instacart — canonical system.
companies/instacart — operator.