Skip to content

CONCEPT Cited by 2 sources

Cross-shard query

A cross-shard query is any query whose execution requires data from more than one physical shard in a horizontally-sharded database. The router must contact each involved shard and combine their results. Ben Dicken's framing (Source: sources/2026-04-21-planetscale-database-sharding):

"Cross-shard queries are bad for system performance, and should be avoided whenever possible. When multiple shards need to fulfill a single query, this adds excessive network and CPU overhead to the database cluster."

Cross-shard vs scatter-gather

Cross-shard is the general phenomenon; scatter-gather is a specific shape of it:

  • Cross-shard query — involves ≥2 shards. Could be 2 (a pointed two-shard join), could be all of them.
  • Scatter-gather query — specifically fans out to every shard, then aggregates the results back. A worst-case cross-shard query.

A query that routes to 2 of 128 shards is cross-shard but not scatter-gather; a query without a shard-key predicate becomes scatter-gather. The terminology on this wiki reflects the two sources of the framing: Dicken names the general case (cross-shard); Figma / the Vitess community name the fan-out shape (scatter-gather). Same underlying cost structure at the extreme.

Why it's a scale cap, not just latency

Any query that touches every shard loads the cluster exactly like the same query on an unsharded database would — N shards each do the same work as a single node would have done, the router aggregates, and the total capacity for that query class is the single-node capacity, not N × single-node.

So scale-out works for the dominant query path only if that path routes to a single shard. Every cross-shard query fraction steals capacity from the horizontal-sharding promise. If cross-shard queries are 10% of the workload, 10% of the cluster capacity behaves as if the database were unsharded — and that fraction is the scale cap for cross-shard-heavy workloads.

Four canonical sources of cross-shard queries

  1. Shard-key-less predicates. SELECT ... FROM users WHERE email = ? on a table sharded by user_id — no shard-key filter, every shard is a candidate.
  2. Range scans on a hashed shard key. WHERE user_id BETWEEN 1 AND 1000 under hash sharding — sequential keys hash to different shards, so the range scatters.
  3. Joins across shard-key boundaries. Joining users (sharded by user_id) to orgs (sharded by org_id) without co-location means the proxy fans out the join.
  4. Global aggregations. SELECT COUNT(*) FROM orders — every shard must contribute.

Dicken's worked example for (1) uses the steps table:

"What would happen if we had done range sharding on the step_count column. Try inserting some rows and then click select where id == 1 to get all of the entries for user with id = 1. As you can see, the SELECT query may have to spread across multiple shards to make this work!" (Source: sources/2026-04-21-planetscale-database-sharding)

The dominant query is "all steps for user X"; sharding steps by step_count (a low-cardinality, high-volatility column) forces that query across every shard. Re-sharding by user_id (query-pattern-aligned choice) collapses it to a single-shard query.

Mitigation — pick the shard key for the dominant query

The discipline for avoiding cross-shard queries is shard-key-aligned-with-query-pattern: choose the shard key so the most common query's predicate contains the shard key, so that query routes to exactly one shard. Every other query becomes cross-shard; that's the trade-off.

For queries that must be cross-shard, the system accepts the cost: a point-lookup by email gets a lookup Vindex; global aggregations are pre-computed or answered by an OLAP replica; cross-colo joins are either colocated or manually assembled at the application layer.

Seen in

Last updated · 347 distilled / 1,201 read