Skip to content

PATTERN Cited by 1 source

Complexity-Tiered Model Selection

Definition

Complexity-tiered model selection is the pattern of routing each input to a different model stack based on an estimate of its complexity, rather than running a single one-size-fits-all pipeline on every input. Simple inputs go through a cheap, fast path; complex inputs go through an expensive, accurate path. The routing decision is typically made early, based on a cheap-to-compute complexity signal.

This is a routing pattern: the input determines the pipeline. Contrast with ensemble gating (where every input runs through all paths and outputs are combined), with LLM cascades (where every input starts cheap and escalates on low confidence), and with cheap- approximator-with-expensive-fallback (same escalation idea, different vocabulary).

Why it matters

Most production ML problems have a long tail of difficulty: most inputs are easy and solvable with cheap models; a minority require the heavyweight pipeline. Running the heavyweight pipeline on every input wastes cost on the easy majority.

Examples on the wiki:

  • Instacart flyer digitization: simple flyers (few, well-separated boxes) use iterative-grid multimodal-LLM probing (~90% accuracy). Complex flyers (overlapping, varied layouts) use SAM + four-stage post-processing ensemble.
  • Instacart PARSE: simple attributes use a cheap LLM (–70% cost). Hard attributes use a more expensive LLM (because cheap-LLM drops 60% accuracy on them).
  • concepts/llm-cascade generalises this to LLM-only pipelines where the tier-signal is the cheap model's own confidence.

The core insight: there's no single right model for all inputs, and pre-committing to the worst-case model over-provisions.

Mechanism

The general shape:

       input
  ┌─ cheap tier ───────┐
  │ complexity signal  │
  │ + easy-path model  │
  └────┬───────────────┘
    cheap-path OK ? ── yes ──▶ output
       no
  ┌─ expensive tier ──┐
  │ heavyweight model │
  │ + post-processing │
  └────┬──────────────┘
      output

Key design questions:

  • What's the complexity signal? Often a cheap heuristic (image density, string length, input size) or the cheap tier's own confidence.
  • Is routing static or learned? Heuristic rules are simple; a learned routing classifier is more robust but adds a training dependency.
  • Is the pipeline a cascade (escalation) or a router (upfront split)? Cascade pays for the cheap tier on every input; router skips the cheap tier for predicted-hard inputs. Cascade is simpler; router is cheaper when the complexity signal is cheap-but-informative.

Instacart's flyer-digitization routing

The Instacart team's version of this pattern routes on two different axes simultaneously:

  1. Per-flyer complexity axis (simple vs. complex flyer) → determines whether Phase 1 uses iterative-grid VLM probing or the SAM stack.
  2. Per-retailer density axis (dense vs. sparse flyer layout) → determines whether the contour-detection model is included in the Phase-1 ensemble.

Neither axis is a learned classifier; the post describes them as observed-per-retailer and observed-per-flyer heuristics. The unifying rule is: don't commit globally; match pipeline cost to input difficulty.

Tradeoffs / gotchas

  • Routing errors cost both ways. A hard input mistakenly routed to the cheap tier ships a wrong answer; an easy input mistakenly routed to the expensive tier wastes compute. The routing classifier's own error is a load- bearing contributor to end-to-end quality.
  • Operational complexity. Multiple pipelines to maintain, evaluate, and version. Monitoring must cover the routing decision itself (drift in the complexity distribution shifts cost).
  • Stateful complexity signals are hard. Per-retailer tuning works when the retailer set is small and stable. At scale (dozens / hundreds of partners), per-partner manual tuning stops scaling; need a learned per-partner model.
  • Tier boundaries drift. The cheap tier's capability improves over time (e.g. VLMs keep improving); the optimal routing boundary moves. Tier thresholds need periodic re-calibration.

Seen in

Last updated · 319 distilled / 1,201 read