Skip to content

PATTERN Cited by 2 sources

Complexity-Tiered Model Selection

Definition

Complexity-tiered model selection is the pattern of routing each input to a different model stack based on an estimate of its complexity, rather than running a single one-size-fits-all pipeline on every input. Simple inputs go through a cheap, fast path; complex inputs go through an expensive, accurate path. The routing decision is typically made early, based on a cheap-to-compute complexity signal.

This is a routing pattern: the input determines the pipeline. Contrast with ensemble gating (where every input runs through all paths and outputs are combined), with LLM cascades (where every input starts cheap and escalates on low confidence), and with cheap- approximator-with-expensive-fallback (same escalation idea, different vocabulary).

Why it matters

Most production ML problems have a long tail of difficulty: most inputs are easy and solvable with cheap models; a minority require the heavyweight pipeline. Running the heavyweight pipeline on every input wastes cost on the easy majority.

Examples on the wiki:

  • Instacart flyer digitization: simple flyers (few, well-separated boxes) use iterative-grid multimodal-LLM probing (~90% accuracy). Complex flyers (overlapping, varied layouts) use SAM + four-stage post-processing ensemble.
  • Instacart PARSE: simple attributes use a cheap LLM (–70% cost). Hard attributes use a more expensive LLM (because cheap-LLM drops 60% accuracy on them).
  • concepts/llm-cascade generalises this to LLM-only pipelines where the tier-signal is the cheap model's own confidence.

The core insight: there's no single right model for all inputs, and pre-committing to the worst-case model over-provisions.

Mechanism

The general shape:

       input
  ┌─ cheap tier ───────┐
  │ complexity signal  │
  │ + easy-path model  │
  └────┬───────────────┘
    cheap-path OK ? ── yes ──▶ output
       no
  ┌─ expensive tier ──┐
  │ heavyweight model │
  │ + post-processing │
  └────┬──────────────┘
      output

Key design questions:

  • What's the complexity signal? Often a cheap heuristic (image density, string length, input size) or the cheap tier's own confidence.
  • Is routing static or learned? Heuristic rules are simple; a learned routing classifier is more robust but adds a training dependency.
  • Is the pipeline a cascade (escalation) or a router (upfront split)? Cascade pays for the cheap tier on every input; router skips the cheap tier for predicted-hard inputs. Cascade is simpler; router is cheaper when the complexity signal is cheap-but-informative.

Instacart's flyer-digitization routing

The Instacart team's version of this pattern routes on two different axes simultaneously:

  1. Per-flyer complexity axis (simple vs. complex flyer) → determines whether Phase 1 uses iterative-grid VLM probing or the SAM stack.
  2. Per-retailer density axis (dense vs. sparse flyer layout) → determines whether the contour-detection model is included in the Phase-1 ensemble.

Neither axis is a learned classifier; the post describes them as observed-per-retailer and observed-per-flyer heuristics. The unifying rule is: don't commit globally; match pipeline cost to input difficulty.

Tradeoffs / gotchas

  • Routing errors cost both ways. A hard input mistakenly routed to the cheap tier ships a wrong answer; an easy input mistakenly routed to the expensive tier wastes compute. The routing classifier's own error is a load- bearing contributor to end-to-end quality.
  • Operational complexity. Multiple pipelines to maintain, evaluate, and version. Monitoring must cover the routing decision itself (drift in the complexity distribution shifts cost).
  • Stateful complexity signals are hard. Per-retailer tuning works when the retailer set is small and stable. At scale (dozens / hundreds of partners), per-partner manual tuning stops scaling; need a learned per-partner model.
  • Tier boundaries drift. The cheap tier's capability improves over time (e.g. VLMs keep improving); the optimal routing boundary moves. Tier thresholds need periodic re-calibration.

Seen in

  • sources/2026-02-09-instacart-from-print-to-digital-making-weekly-flyers-shoppablecanonical wiki instance. Instacart's flyer- digitization pipeline routes simple flyers to iterative-grid multimodal-LLM probing (~90% accuracy) and complex flyers to the SAM-based post-processed stack. Additionally, the contour-detection model in Phase 1's ensemble is gated per retailer on flyer density.
  • sources/2026-04-21-vercel-build-knowledge-agents-without-embeddingschatbot-router altitude instance. Vercel's Knowledge Agent Template classifies every incoming question by complexity and dispatches simple questions to fast/cheap models and hard questions to slow/powerful ones. Routing happens per-question (always-on classification), not per-input-heuristic like Instacart's per-retailer tuning. Transport layer is Vercel AI Gateway, making the tier → provider mapping a config change rather than a code change: "Cost optimization happens automatically, with no manual rules. Compatible with any AI SDK model provider via Vercel AI Gateway." Classifier mechanism not disclosed (heuristic / fine- tuned model / prompt / embedding).
  • Closely related wiki instances: systems/instacart-parse applies the same pattern to attribute extraction (cheap LLM for simple attributes, expensive LLM for hard ones). concepts/llm-cascade is the cascade-flavour of this pattern on pure-LLM pipelines.
Last updated · 542 distilled / 1,571 read