Skip to content

CONCEPT Cited by 1 source

Nested-loop join

Definition

A nested-loop join (NL join) is the join algorithm that iterates over the left-hand side (LHS) rows one at a time and, for each LHS row, issues a lookup against the right-hand side (RHS) using the row's join-key value as a bound parameter. It is the conceptually simplest join algorithm:

for each row L in LHS:
    for each row R in RHS where R.join_key = L.join_key:
        emit (L, R)

When the inner loop is supported by an index on R.join_key, each lookup is O(log N) on a B-tree, making the total NL join cost O(|LHS| × log |RHS|). Without such an index, it degrades to O(|LHS| × |RHS|) — the full cartesian product.

NL join vs other join algorithms

Single-node database engines have three classical join algorithms:

  • Nested-loop join — iterative; best when LHS is small and RHS has an index on the join key.
  • Sort-merge join — sort both sides by join key + merge; best when both sides are already sorted or can be sorted cheaply.
  • Hash join — build a hash table on the smaller side, probe with the larger side; best when both sides are large and no join-key index exists.

MySQL's single-node planner picks among these based on cardinalities and index availability. The planner prefers NL joins when there's a usable index on the inner side's join key.

NL join in sharded / distributed systems

The VTGate query planner uses cross-shard NL joins as the default strategy when the two sides of a join cannot be colocated on the same shard:

"The join is a nested loop join, which means that we'll execute the query on the left-hand side (LHS) of the Join, and using that result we'll issue queries on the right-hand side (RHS), one query per row." (Source: sources/2026-04-21-planetscale-grouping-and-aggregations-on-vitess)

Each LHS row triggers one RHS query over the network to the appropriate shard. The N+1 queries shape is the canonical cost: for an LHS of size N, the join issues N RHS queries.

Why NL is the canonical cross-shard join

Alternatives are harder:

  • Cross-shard sort-merge requires ordered scans of both sides, which for a sharded table means scatter + per-shard sort + merge at the coordinator — expensive for large tables, and unavailable for queries with arbitrary WHERE clauses.
  • Cross-shard hash join requires either (a) materialising one side's build table at the coordinator (memory pressure) or (b) a distributed hash-build + partitioned probe protocol (complex, expensive network shuffle).
  • NL join requires only per-row RHS queries, each of which is a normal single-shard lookup. The network cost is linear in LHS size × per-query RTT.

For workloads where LHS is small (e.g. point-lookup-style), cross-shard NL join is efficient. For workloads where LHS is large, cross-shard NL join's N+1-query shape becomes a latency and QPS bottleneck — this is where planner rewrites like aggregation-pushdown-under-join pay off: pre-aggregation reduces LHS cardinality by the group-collapsing factor, reducing the RHS-query count proportionally.

The N+1 problem reframed

Application-tier ORM anti-patterns often produce the same shape: a parent-table query followed by one child-table query per parent row. The database community has deprecated this as the N+1 problem. Cross-shard NL joins are structurally identical — one parent query plus N child queries — but happen inside the query engine rather than at the application tier. The mitigation is different (planner rewrites that batch the RHS queries, colocate the tables, or pre-aggregate to reduce N) but the cost structure is the same.

Cost dominants

  • Per-RHS-query network RTT — cross-shard: ~1-10 ms; single-node: microseconds.
  • Per-RHS-query parse + plan + execute — cross-shard: an LHS with 1M rows triggers 1M MySQL query-parse events.
  • Total wall-clock time — serial NL: N × per-query latency; parallel NL: N × (per-query latency / concurrency) bounded by RHS shard capacity.

Vitess supports parallel RHS query issuance within the NL join (up to a configured concurrency), which helps latency but not total QPS on the RHS shards.

When NL is the right shape

Cross-shard NL is the right answer when:

  • LHS is small (e.g. thousands of rows, not millions).
  • RHS has an index on the join key (cheap per-lookup).
  • The two sides cannot be colocated (shard keys don't align).
  • Pre-aggregation via concepts/push-aggregation-under-join can collapse LHS sufficiently.

Cross-shard NL is the wrong answer when:

  • LHS is large and cannot be reduced via pre-aggregation.
  • RHS lacks a join-key index.
  • The join is many-to-many (cartesian-product risk).

In those cases, the query either needs to be rewritten (denormalise, colocate, batch application-side) or accepted as expensive.

Seen in

Last updated · 347 distilled / 1,201 read