Skip to content

PATTERN Cited by 1 source

Buddy allocator adaptive shard scheduling

Problem

In a thread-per-core system writing to object storage, upload parallelism must adapt to load: too many streams produce tiny payloads (wasting S3 PUTs and money), too few streams bottleneck throughput under heavy load. A centralized coordinator that decides for all shards requires cross-shard locks and becomes itself a bottleneck.

Solution

Organize shards into power-of-two-sized groups using the buddy allocator algorithm from OS memory management. Each group uploads independently via round-robin among its member shards. Split and merge decisions are local:

  1. Split: A group leader checks its batcher backlog. If it exceeds a threshold proportional to group size, the group splits in half. Each half becomes a new independent group.
  2. Merge: When both a group and its buddy have empty backlogs, the lower-numbered group absorbs its buddy.
  3. Race prevention: Only the lower-numbered group can initiate a merge.
  4. Hysteresis: A short window between decisions prevents oscillation.

The only shared mutable state is a per-group mutex held briefly during the request-pulling phase — never during the S3 upload. Backlog checks use cache-line-padded atomic counters.

Resulting context

  • Under light load: system converges to a single group → maximum batching, minimum PUT cost.
  • Under heavy load: groups split until batcher keeps up → parallelism scales to number of shards.
  • No operator tuning required; a cluster at 50 MB/s and one at 5 GB/s both find the right behavior automatically.
  • Pipeline stages (batcher, scheduler, uploader) evolve independently.

Known uses

Last updated · 542 distilled / 1,571 read