CONCEPT Cited by 1 source
Nested-loop join¶
Definition¶
A nested-loop join (NL join) is the join algorithm that iterates over the left-hand side (LHS) rows one at a time and, for each LHS row, issues a lookup against the right-hand side (RHS) using the row's join-key value as a bound parameter. It is the conceptually simplest join algorithm:
When the inner loop is supported by an index on R.join_key, each lookup is O(log N) on a B-tree, making the total NL join cost O(|LHS| × log |RHS|). Without such an index, it degrades to O(|LHS| × |RHS|) — the full cartesian product.
NL join vs other join algorithms¶
Single-node database engines have three classical join algorithms:
- Nested-loop join — iterative; best when LHS is small and RHS has an index on the join key.
- Sort-merge join — sort both sides by join key + merge; best when both sides are already sorted or can be sorted cheaply.
- Hash join — build a hash table on the smaller side, probe with the larger side; best when both sides are large and no join-key index exists.
MySQL's single-node planner picks among these based on cardinalities and index availability. The planner prefers NL joins when there's a usable index on the inner side's join key.
NL join in sharded / distributed systems¶
The VTGate query planner uses cross-shard NL joins as the default strategy when the two sides of a join cannot be colocated on the same shard:
"The join is a nested loop join, which means that we'll execute the query on the left-hand side (LHS) of the Join, and using that result we'll issue queries on the right-hand side (RHS), one query per row." (Source: sources/2026-04-21-planetscale-grouping-and-aggregations-on-vitess)
Each LHS row triggers one RHS query over the network to the appropriate shard. The N+1 queries shape is the canonical cost: for an LHS of size N, the join issues N RHS queries.
Why NL is the canonical cross-shard join¶
Alternatives are harder:
- Cross-shard sort-merge requires ordered scans of both sides, which for a sharded table means scatter + per-shard sort + merge at the coordinator — expensive for large tables, and unavailable for queries with arbitrary
WHEREclauses. - Cross-shard hash join requires either (a) materialising one side's build table at the coordinator (memory pressure) or (b) a distributed hash-build + partitioned probe protocol (complex, expensive network shuffle).
- NL join requires only per-row RHS queries, each of which is a normal single-shard lookup. The network cost is linear in LHS size × per-query RTT.
For workloads where LHS is small (e.g. point-lookup-style), cross-shard NL join is efficient. For workloads where LHS is large, cross-shard NL join's N+1-query shape becomes a latency and QPS bottleneck — this is where planner rewrites like aggregation-pushdown-under-join pay off: pre-aggregation reduces LHS cardinality by the group-collapsing factor, reducing the RHS-query count proportionally.
The N+1 problem reframed¶
Application-tier ORM anti-patterns often produce the same shape: a parent-table query followed by one child-table query per parent row. The database community has deprecated this as the N+1 problem. Cross-shard NL joins are structurally identical — one parent query plus N child queries — but happen inside the query engine rather than at the application tier. The mitigation is different (planner rewrites that batch the RHS queries, colocate the tables, or pre-aggregate to reduce N) but the cost structure is the same.
Cost dominants¶
- Per-RHS-query network RTT — cross-shard: ~1-10 ms; single-node: microseconds.
- Per-RHS-query parse + plan + execute — cross-shard: an LHS with 1M rows triggers 1M MySQL query-parse events.
- Total wall-clock time — serial NL: N × per-query latency; parallel NL: N × (per-query latency / concurrency) bounded by RHS shard capacity.
Vitess supports parallel RHS query issuance within the NL join (up to a configured concurrency), which helps latency but not total QPS on the RHS shards.
When NL is the right shape¶
Cross-shard NL is the right answer when:
- LHS is small (e.g. thousands of rows, not millions).
- RHS has an index on the join key (cheap per-lookup).
- The two sides cannot be colocated (shard keys don't align).
- Pre-aggregation via concepts/push-aggregation-under-join can collapse LHS sufficiently.
Cross-shard NL is the wrong answer when:
- LHS is large and cannot be reduced via pre-aggregation.
- RHS lacks a join-key index.
- The join is many-to-many (cartesian-product risk).
In those cases, the query either needs to be rewritten (denormalise, colocate, batch application-side) or accepted as expensive.
Seen in¶
- sources/2026-04-21-planetscale-grouping-and-aggregations-on-vitess — Andres Taylor (2022-06-24) canonicalises the VTGate NL join as the cross-shard-join default: "we'll execute the query on the left-hand side (LHS) of the Join, and using that result we'll issue queries on the right-hand side (RHS), one query per row." The post's aggregation-pushdown-under-join rewrite is specifically motivated by reducing the LHS cardinality that drives NL join cost.
Related¶
- concepts/cross-shard-query — the regime where NL joins become expensive.
- concepts/scatter-gather-query — the alternative query shape when no join is needed.
- concepts/vtgate-query-planner — the component that picks NL as the cross-shard join shape.
- concepts/push-aggregation-under-join — the rewrite that reduces NL join cost.
- patterns/aggregation-pushdown-under-join — the production pattern using this rewrite.
- systems/vitess
- systems/mysql