PATTERN Cited by 1 source
Phase-gated planner rewriter¶
Intent¶
When two otherwise-correct query-planner rewriters interfere — one wedges the tree into a shape that blocks the other — resolve the interference by gating one rewriter behind a later planner phase, so that it only fires after the blocked-peer rewriter has had a chance to run. The gated rewriter's logic is unchanged; only its activation time moves.
Shipped in Vitess as vitessio/vitess #16278, fixing an OOM at VTGate caused by "push ordering under aggregation" firing before "split aggregation across a join" could fire.
Context¶
Query planners using fixed-point tree rewriting within sequential phases face a characteristic failure mode: two rewriters that are each individually correct produce, in combination, a pathological tree shape. The bug manifests as:
- Rewriter A fires first, transforming the tree into shape T'.
- Rewriter B's precondition, which was satisfied on the original tree T, is not satisfied on T'.
- B can't fire.
- The plan is correct but suboptimal — sometimes catastrophically (OOM, full scatter-fetch, wrong join order).
Canonical case (Source: sources/2026-04-21-planetscale-optimizing-aggregation-in-the-vitess-query-planner): a query SELECT sum(user.type) FROM user JOIN user_extra ON user.team_id = user_extra.id GROUP BY user_extra.id ORDER BY user_extra.id. The planner's "push ordering under aggregation" rewriter fires in the initial phase, producing:
Now Ordering sits between Aggregator and ApplyJoin, and can't push further down — "Ordering can only be pushed down to the left hand side. Ordering [here] is blocking the aggregator from being pushed down." When the "split aggregation" phase later runs, it sees an Ordering node wedged in the middle and can't decompose the aggregation into local halves pushable under each side of the join. VTGate pulls the full joined row set into memory, OOMs.
Solution¶
Move the blocking rewriter ("push ordering under aggregation") from the initial phase to a later phase (here, the split-aggregation phase). Taylor verbatim:
"By delaying the 'ordering under aggregation' rewriter until the 'split aggregation' phase, we can push down the aggregation under the join. This doesn't stop the 'ordering under aggregation' rewriter from doing its job, it just has to wait a bit before doing it."
Effect on the plan tree:
Before the fix, initial phase: "split aggregation" rewriter gated off (it only runs in a later phase); "push ordering under aggregation" active → ordering pushed, wedged between aggregator and join, blocks split-aggregation forever.
After the fix, initial phase: "split aggregation" gated off; "push ordering under aggregation" also gated off → neither fires yet.
Next phase (split aggregation): both enabled → split-aggregation fires first (the fixed-point loop reaches it first because the Ordering is still at the top). Aggregation decomposes into local halves and pushes under the join. Then "push ordering under aggregation" fires on the now-decomposed tree, producing the final plan:
Aggregator (sum group by user_extra.col)
└── Projection (sum × count(*), user_extra.col)
└── Ordering (user_extra.col)
└── ApplyJoin
├── Route (scatter on user)
│ └── Aggregator (sum(type) group by team_id) ← pushed to MySQL
└── Route (scatter on user)
└── Aggregator (count(*) group by col) ← pushed to MySQL
Most of the aggregation now runs inside MySQL; VTGate only sums the sums. No OOM.
Forces¶
- Rewriters should be simple and self-contained. Teaching each rewriter about every peer rewriter's effects creates an O(n²) knowledge graph that resists maintenance.
- Phase ordering is cheap to change. Moving a rewriter from phase N to phase N+1 is a one-line enum change in most planner architectures.
- The algebraic validity of each rewriter is unchanged. The fix doesn't alter the rewrite logic — only when it activates.
- Phase gates compose. A rewriter can be gated behind multiple phases (e.g., only after phases 2 and 4 have completed). The gate mechanism scales to arbitrarily complex activation conditions.
- Discovering the right phase is empirical. The fix is driven by observing the pathological plan tree and identifying which rewriter wedged the blocking shape; no formal dependency graph over rewriters is typically maintained.
Canonical instance¶
- Vitess VTGate planner — vitessio/vitess #16278, shipped 2024. Andres Taylor's 2024-07-22 blog post is the canonical wiki disclosure. The fix moved "push ordering under aggregation" from the initial phase to the split-aggregation phase, unblocking patterns/aggregation-pushdown-under-join for queries with
ORDER BYon the RHS of a cross-shard join.
Consequences¶
- + Cheap fix for rewriter interference. One-line enum change rather than a deep rewrite or algorithm change.
- + Rewriters stay self-contained. No peer-awareness logic added; each rewriter still only understands its own local transformation.
- + Preserves the rewriter's intended behaviour. "It just has to wait a bit before doing it" — the transformation still happens, just in a different tree.
- − Requires diagnostic depth to find. The bug presents as a downstream symptom (OOM, wrong plan), not as a rewriter-ordering violation. Diagnosis needs plan-tree traces across phase boundaries.
- − Phase list grows over time. Each fix of this shape pushes one more rewriter to a specific phase; over years, the phase enum accumulates specific-to-this-bug positions. Documentation burden on the phase list.
- − Doesn't prevent future interference. A new rewriter added to phase N can re-introduce interference with an existing phase-N rewriter. Phase ordering is a disciplined rule list, not a guaranteed-correct structure.
Extensions¶
- Phase gates as dependency declarations. Rewriters could declare "run only after rewriter X" / "run only before rewriter Y" explicitly rather than by numeric phase. The planner then topologically sorts. Vitess hasn't disclosed this; the 2024 post uses numeric / named phases.
- Rule priority within phase. A secondary mechanism: within a single phase, give some rules higher priority to fire first (Calcite uses this). Vitess uses fixed-point loop without priority disclosure.
- Auto-detect rewriter interference via property-based testing. Generate random SQL, run the planner, verify the plan is not worse than the naive plan. Detect cases where the phase-gated planner produces better plans than the ungated one → implies interference exists.
Related patterns¶
- patterns/aggregation-pushdown-under-join — the specific rewrite the 2024 fix was protecting. The fix doesn't change this pattern's algorithm; it ensures the planner can reach it.
Seen in¶
- sources/2026-04-21-planetscale-optimizing-aggregation-in-the-vitess-query-planner — canonical wiki introduction. Andres Taylor (PlanetScale / Vitess core, 2024-07-22) canonicalises phase-gated rewriter positioning as the cheap-fix recipe for rewriter interference in a fixed-point-rewriting query planner. Bug: OOM in VTGate due to "push ordering under aggregation" firing before "split aggregation" could decompose the aggregator. Fix: delay the ordering rewriter to the split-aggregation phase. Shipped as vitessio/vitess #16278. Sequel to the 2022-06-24 Grouping and aggregations on Vitess post — same author, same mechanism, two years of production experience later.