Skip to content

PATTERN Cited by 1 source

Shadow application readiness

Use live production traffic as the test oracle to pick the supported subset of an API surface, before committing application code to it. Run the planner / analyzer on real queries in a shadow path, log plans to a warehouse, do offline analysis, and let the distribution of real traffic pick the subset that covers the common case and excludes the worst cases.

Shape

  1. Define candidate API restrictions (e.g. "which shard-key schemes could work for this table?", "which SQL subset do we support in the sharded router?").
  2. Run the planner / analyzer for each candidate against live production traffic, without affecting the serving path. Log each query + resulting plan (supported / needs-fallback / impossible) to an analytics store.
  3. Offline, classify the distribution:
  4. Queries that route cleanly to a single shard / single plan.
  5. Queries that need scatter-gather or other complex execution.
  6. Queries that are worst-case-complex.
  7. Pick the subset that covers the 90% common case while explicitly excluding worst-case complexity.
  8. Publish the subset as the supported API; flag the outliers as product code that must be rewritten before onboarding the table.

Why this pattern

The alternative is to pick a subset by reading code, writing RFCs, and negotiating with teams — which is expensive, slow, and biased by recency / vocal subsystems. Live traffic is the true distribution. Shadow application readiness replaces "what do we think queries look like?" with "what do queries look like, right now, at our actual traffic mix?"

Two classes of trade-off become concrete:

  • Coverage vs engine complexity. Each additional query shape supported in the router grows its implementation. Live-traffic data shows exactly which shapes are worth the engine-complexity cost.
  • Product refactor scope. Unsupported queries become application-layer rewrite tasks. Live data shows how many lines of product code each exclusion touches, letting the team draw the line where it maximizes coverage per unit of rewrite.

Figma's instantiation

Building DBProxy's sharded-query language (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale):

  • Users defined candidate sharding schemes for their tables.
  • DBProxy's logical planning phase ran in shadow against live traffic, producing per-query plan classifications.
  • Plans were logged to Snowflake for offline analysis.
  • Output: a query language covering 90% of queries (all range-scans and point queries, joins only across same-colo tables on the shard key) that avoided the worst-case query-engine complexity (cross-colo joins, joins off the shard key, arbitrary nested SQL).
  • patterns/shadow-migration — old and new engines both execute; outputs compared. Oriented to correctness of the new engine under realistic load.
  • patterns/shadow-validation-dependency-graph — new derived structure emits errors whenever the authoritative path does something it didn't predict. Oriented to completeness of a prediction.
  • Shadow application readiness (this page) — no new engine yet; shadow the planner to pick what the engine will support. Oriented to API scoping.

Seen in

Last updated · 200 distilled / 1,178 read