Skip to content

PATTERN Cited by 1 source

AI review risk tiering

Intent

Classify every incoming change into a small number of risk tiers before any AI reviewer runs, and use the tier to decide how many agents to spawn, which model tier each runs on, and what severity of finding blocks a merge. Scale firepower to evident risk; don't send frontier models to review typo fixes.

When to reach for it

  • AI review cost scales linearly with fan-out × model tier × re-run frequency; without tiering, the cost function is flat over risk.
  • The change-size distribution is skewed — most MRs are small, a few are huge — so there is meaningful headroom in downscaling the common case.
  • Security / privacy / compliance paths exist where cheaper tiers are unacceptable regardless of diff size.

Mechanism

  1. Preprocess the diff. Strip noise files (lock files, minified assets, generated headers — except migrations). See concepts/diff-noise-filtering.
  2. Compute classification inputs. Cheap metrics over the filtered diff: total lines changed, file count, path matches against a security-sensitive allow-list.
  3. Map to tier. A small decision tree — see concepts/risk-tier-assessment. Cloudflare's tree:
if fileCount > 50 || hasSecurityFiles: return "full"
if totalLines <= 10 && fileCount <= 20: return "trivial"
if totalLines <= 100 && fileCount <= 20: return "lite"
return "full"
  1. Bind tier to agent set + model tier.
Tier Agents Coordinator model
Trivial Coordinator + 1 generalised reviewer Sonnet (downgraded from Opus)
Lite Coordinator + code quality + documentation + 1 more Opus / GPT-5.4
Full Coordinator + 7+ specialists (security, perf, quality, docs, release, codex, AGENTS.md) Opus / GPT-5.4
  1. Short-circuit the expensive tier for sensitive paths. Any auth/ / crypto/ / security-sounding path matches → full tier regardless of diff size. "We'd rather spend a bit extra on tokens than potentially miss a security vulnerability."

Observed economics

Cloudflare's first 30 days:

Tier Reviews Avg cost P99
Trivial 24,529 $0.20 $0.74
Lite 27,558 $0.67 $1.95
Full 78,611 $1.68 $5.05

A 30-day review bill with full-tier for every MR would be ≈ $79 k (48 k MRs × $1.68 × 1.0 re-review, ignoring re-runs). Actual bill: closer to ≈ $63 k after tiering savings on trivial + lite tiers (~16 k × $1.48 saved + ~28 k × $1.01 saved ≈ $52 k saved vs. uniform full — rough estimate). Tiering's value accrues even when full-tier dominates volume, because the trivial-to-full cost ratio is ~8×.

Sibling patterns

  • vs. patterns/rollout-escape-hatch — both gate expensive/risky behaviour on cheap criteria. Tiering gates AI cost on diff characteristics; the rollout-escape-hatch gates deployment risk on manual override.
  • vs. patterns/ci-regression-budget-gate — both shape how CI responds to change shape; regression-budget-gate governs what gets run based on perf budgets, tiering governs what gets reviewed based on risk.
  • vs. concepts/blast-radius — same philosophy applied at different layers. Blast-radius shapes deployment rollout; AI review tiering shapes pre-merge review investment.

Tradeoffs

  • Wrong tier on edge-case diffs. A one-line change in auth/login.ts is trivial by size but must be full-tier — handled by the path allow-list.
  • Tiering logic must be cheap. Classification cost rolled into the orchestrator's startup phase; if it's expensive the savings are eaten.
  • Tier-boundary cliffs. A 99-line diff gets lite; a 101-line diff gets full. Smooth-gradient alternatives exist (fractional-agent spawn) but Cloudflare's instance uses hard thresholds for operational simplicity.

Seen in

Last updated · 200 distilled / 1,178 read