CONCEPT Cited by 2 sources
Scatter-gather query¶
A scatter-gather query is a query executed in a sharded system that cannot be routed to a single shard (because its predicate doesn't include the shard key or because it needs data from multiple shards). The router fans it out to every shard (scatter phase), then aggregates the results back (gather phase).
Figma's DBProxy framing (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale):
Think of scatter-gather like a database-wide game of hide-and-seek: You send out your query to every shard (scatter), then piece together answers from each (gather). Fun, but overdo it, and your speedy database starts feeling more like a snail, especially with complex queries.
When scatter-gather happens¶
- Shard-key-less predicates —
WHERE status = 'active'on a table sharded byuser_id. - Range scans on the shard key when the shard key is hashed —
hash(shard_key)routing scatters sequential keys across all shards. - Global aggregations —
COUNT(*)over a whole table,SUMacross orgs. - Joins across colos or joins not on the shard key.
- Nested SQL / correlated subqueries whose outer and inner queries don't share the same shard-key filter.
Why it's a scale cap, not just a latency cost¶
Every scatter-gather touches every shard → the load profile of one SG call is identical to running the same query on an unsharded database. More shards don't help. If scatter-gather is a large fraction of traffic, horizontal sharding's core promise — more shards = more capacity — stops holding. The sharded database asymptotes to single-node performance as the SG fraction grows.
This is the reason Figma's DBProxy deliberately restricts the sharded-query language rather than supporting full SQL compatibility. Three kinds of costs pushed that decision:
- Query-engine complexity. Aggregations, cross-colo joins, and nested SQL with SG execution plans all require custom implementation in the router — DBProxy would "begin to look a lot like the Postgres database query engine."
- Scale cap. Too many SGs invalidate the shard count as a scalability knob.
- Application refactor burden. Every unsupported query is product code to rewrite.
Choosing the supported-query subset¶
Figma's answer — patterns/shadow-application-readiness — logged query plans against live production traffic to empirically identify the subset covering 90% of queries while excluding the worst-case engine-complexity patterns. Resulting restriction:
- All range scans and point queries allowed.
- Joins only when joining two tables in the same colo on the shard key.
Queries outside the subset must be rewritten — usually by adding a shard-key predicate, denormalizing across colos, or moving the aggregation to an offline / asynchronous path.
Relation to the Redis-proxy flavor¶
FigCache (systems/figcache) implements a structurally similar "fanout filter engine" that transparently scatters read-only multi-shard pipelines across Redis Cluster shards to sidestep CROSSSLOT errors. Same primitive — fan-out + aggregate — at the wire-protocol-proxy level instead of the SQL query-engine level.
Mitigations¶
- Push down predicates aggressively — any predicate the planner can resolve per-shard reduces network return size.
- Parallel scatter — fan out concurrently to overlap per-shard latencies; end-to-end latency ≈ slowest shard.
- Hedged requests against slow shards (implemented in DBProxy — Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).
- Load-shedding on SG-heavy paths to protect non-SG traffic.
- Pre-aggregation into a summary table (often sharded the same way as the facts) to convert a common SG into a single-shard lookup.
Seen in¶
- — Andres Taylor canonicalises the local-global aggregation decomposition as Vitess's baseline scatter-gather mitigation: each shard runs local aggregation; VTGate merges partials. The worked example
SELECT count(*) FROM usersendscount(*)to every shard andSUMs the results. This is the primitive that makes scatter-gather aggregation queries tractable at scale — without it, everyCOUNT/SUMover a sharded table would ship O(rows) data to the coordinator; with it, each shard ships one integer. Extends to the aggregation-pushdown-under-join rewrite for queries with joins above the aggregation. - sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale — DBProxy's logical planner distinguishes single-shard from scatter-gather plans; the sharded-query subset is explicitly chosen to avoid worst-case SG complexity; shadow-application-readiness picks the subset from live traffic.
- sources/2026-04-21-figma-figcache-next-generation-data-caching-platform — FigCache's fanout filter engine scatters read-only multi-shard Redis pipelines in parallel.
- sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search
— canonical wiki instance of scatter-gather as the pathology
amplifier in a self-inflicted DoS. Elasticsearch facet /
termsaggregation queries inherit the scatter-gather execution shape: a coordinating node scatters to all relevant shard copies (shard selection by Adaptive Replica Selection), each shard runs per-shard aggregation on thesearchthread pool, the coordinator does partial reductions in batches. On a high-cardinality field (SKU, ~100M distinct values), each shard's per-query cost explodes; the scatter-gather shape means every shard copy pays the cost in parallel, so even a small number of pathological queries saturates the cluster. Contrast with concepts/high-cardinality-aggregation-overload for the per-shard mechanics and concepts/self-inflicted-dos for the system-level failure mode. - sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10 — Yelp Nrtsearch uses scatter-gather at the Nrtsearch Coordinator to fan out search across virtually sharded clusters for large indices. Nrtsearch 1.0.0 roadmap flags Lucene 10's intra-single-segment parallel search as the planned in-process replacement — demonstrating that scatter-gather across clusters is often a workaround for absent intra-node parallelism rather than a fundamental architectural choice.
Related¶
- concepts/horizontal-sharding
- concepts/shard-key
- concepts/adaptive-replica-selection-elasticsearch — the Elasticsearch shard-copy-selection variant
- concepts/high-cardinality-aggregation-overload — the per-query pathology that scatter-gather amplifies in search
- patterns/shadow-application-readiness
- systems/dbproxy-figma
- systems/figcache
- systems/elasticsearch — another canonical scatter-gather substrate