Skip to content

CONCEPT Cited by 1 source

Local-global aggregation decomposition

Definition

Local-global aggregation decomposition is the algebraic split of an aggregate function into two parts:

  • A local part that can be computed independently on each shard (or partition) of the data.
  • A global part that runs on the coordinator / proxy / VTGate and merges the local partial results into the final answer.

The split exists because most common SQL aggregates are algebraically decomposable — the final result can be recovered from per-shard partial results without needing the raw data at the coordinator. This is what makes horizontal sharding tractable for aggregation queries: the aggregation's compute scales with the number of rows, but the data sent to the coordinator scales only with the number of groups × shards.

The canonical decompositions

For each SQL aggregate, the local + global split is:

Aggregate Local (per-shard) Global (at coordinator)
COUNT(*) COUNT(*) SUM of per-shard counts
SUM(x) SUM(x) SUM of per-shard sums
MIN(x) MIN(x) MIN of per-shard mins
MAX(x) MAX(x) MAX of per-shard maxes
AVG(x) SUM(x), COUNT(x) SUM(SUM) / SUM(COUNT)
COUNT(DISTINCT x) either set of x or compact sketch union + count, or sketch-merge

COUNT, SUM, MIN, MAX are "straightforward" — the local and global steps use either the same operation (MIN/MAX) or a tightly related one (COUNTSUM). AVG carries two values forward (sum + count). COUNT(DISTINCT) is the hardest: exact execution requires shipping every distinct value (linear in distinct cardinality), while approximate execution uses sketches (HyperLogLog) that merge constant-size.

The canonical Vitess framing

From Andres Taylor's 2022-06-24 post (Source: sources/2026-04-21-planetscale-grouping-and-aggregations-on-vitess):

"Back to aggregations across shards. Let's say you have a user table that is too large to fit into a single database, so you have sharded it. Now a Vitess user asks for the number of users in the whole logical, sharded database. We could fetch all the rows and just count them, but that would be slow and inefficient. So instead we break aggregation into local and global aggregation. The local part is what we can send down to MySQL, and the global aggregation is aggregating the aggregates. So, if the user asked for SELECT count(*) FROM user, Vitess will send down a count(*) to each shard, and then do a SUM on all incoming answers."

This is the canonical wiki naming of the primitive. The decomposition has been present in Vitess since its early days (the post mentions "This is something that Vitess has been able to do for a long time") — it's the baseline every distributed-query engine implements.

Why it's foundational

Without local-global decomposition, a simple SELECT count(*) FROM user on a 256-shard cluster with 10B rows would require shipping 10B rows to the coordinator. With the decomposition, each shard ships one integer; the coordinator sums 256 integers. The data-movement asymptote drops from O(rows) to O(shards × groups).

Every horizontally-scalable query engine depends on this decomposition:

  • Vitess / VTGate — this post is canonical.
  • Apache SparkreduceByKey / combineByKey primitives are local-global aggregation.
  • Presto / Trino — partial-aggregation + final-aggregation stages.
  • BigQuery / Dremel — the same split at the leaf-aggregator + root-aggregator tiers.
  • ClickHouse Distributed tables — shard-local aggregation + distributed_aggregation_memory_efficient merge at the initiator.
  • Apache Druid — segment-local aggregation + broker merge.

With GROUP BY

When the query has a GROUP BY, the decomposition extends to per-group. Each shard emits (group-key, local-aggregate) tuples. The coordinator groups these by group-key again and applies the global aggregate per group. The shard-local output can be much smaller than the raw data (one row per distinct group-key × shard).

Special case: if the GROUP BY key is the shard key, the decomposition degenerates — each shard owns its groups completely, so no global merge is needed, and the query reduces to a pure scatter-gather where each shard emits final answers directly.

Why the global step is cheaper than the local step

The global step runs on pre-aggregated data whose cardinality is groups × shards. The local step runs on raw data whose cardinality is rows. For rows ≫ groups × shards, the local aggregation saves most of the work. This is why Vitess's planner aggressively pushes aggregation down to MySQL: "We want to delegate as much work as possible to MySQL — it is much faster than Vitess at doing all the database operations, including grouping and aggregation" (Source: sources/2026-04-21-planetscale-grouping-and-aggregations-on-vitess).

Limitations

  • Holistic aggregates (median, percentile) are not naively decomposable. Exact computation requires either shipping sorted shard partials + merge-sort at coordinator (linear in data) or approximate algorithms (t-digest, q-digest, GK-quantiles) that merge at constant size.
  • User-defined aggregates are only decomposable if the user supplies the split (SQL window functions do this via a combine operator).
  • When aggregation interacts with joins that can't be pushed down, the decomposition alone isn't enough — see concepts/push-aggregation-under-join for the extension that handles this case.

Seen in

  • sources/2026-04-21-planetscale-grouping-and-aggregations-on-vitess — canonical wiki introduction. Andres Taylor's 2022-06-24 post frames the decomposition as Vitess's baseline approach for cross-shard aggregation: "we break aggregation into local and global aggregation. The local part is what we can send down to MySQL, and the global aggregation is aggregating the aggregates."
Last updated · 347 distilled / 1,201 read