PATTERN Cited by 1 source
Buddy allocator adaptive shard scheduling¶
Problem¶
In a thread-per-core system writing to object storage, upload parallelism must adapt to load: too many streams produce tiny payloads (wasting S3 PUTs and money), too few streams bottleneck throughput under heavy load. A centralized coordinator that decides for all shards requires cross-shard locks and becomes itself a bottleneck.
Solution¶
Organize shards into power-of-two-sized groups using the buddy allocator algorithm from OS memory management. Each group uploads independently via round-robin among its member shards. Split and merge decisions are local:
- Split: A group leader checks its batcher backlog. If it exceeds a threshold proportional to group size, the group splits in half. Each half becomes a new independent group.
- Merge: When both a group and its buddy have empty backlogs, the lower-numbered group absorbs its buddy.
- Race prevention: Only the lower-numbered group can initiate a merge.
- Hysteresis: A short window between decisions prevents oscillation.
The only shared mutable state is a per-group mutex held briefly during the request-pulling phase — never during the S3 upload. Backlog checks use cache-line-padded atomic counters.
Resulting context¶
- Under light load: system converges to a single group → maximum batching, minimum PUT cost.
- Under heavy load: groups split until batcher keeps up → parallelism scales to number of shards.
- No operator tuning required; a cluster at 50 MB/s and one at 5 GB/s both find the right behavior automatically.
- Pipeline stages (batcher, scheduler, uploader) evolve independently.
Known uses¶
- systems/redpanda-cloud-topics — write-request scheduler (Source: sources/2026-06-18-redpanda-adaptive-write-request-scheduling)