PATTERN Cited by 1 source

Buddy allocator adaptive shard scheduling¶

Problem¶

In a thread-per-core system writing to object storage, upload parallelism must adapt to load: too many streams produce tiny payloads (wasting S3 PUTs and money), too few streams bottleneck throughput under heavy load. A centralized coordinator that decides for all shards requires cross-shard locks and becomes itself a bottleneck.

Solution¶

Organize shards into power-of-two-sized groups using the buddy allocator algorithm from OS memory management. Each group uploads independently via round-robin among its member shards. Split and merge decisions are local:

Split: A group leader checks its batcher backlog. If it exceeds a threshold proportional to group size, the group splits in half. Each half becomes a new independent group.
Merge: When both a group and its buddy have empty backlogs, the lower-numbered group absorbs its buddy.
Race prevention: Only the lower-numbered group can initiate a merge.
Hysteresis: A short window between decisions prevents oscillation.

The only shared mutable state is a per-group mutex held briefly during the request-pulling phase — never during the S3 upload. Backlog checks use cache-line-padded atomic counters.

Resulting context¶

Under light load: system converges to a single group → maximum batching, minimum PUT cost.
Under heavy load: groups split until batcher keeps up → parallelism scales to number of shards.
No operator tuning required; a cluster at 50 MB/s and one at 5 GB/s both find the right behavior automatically.
Pipeline stages (batcher, scheduler, uploader) evolve independently.

Known uses¶

systems/redpanda-cloud-topics — write-request scheduler (Source: sources/2026-06-18-redpanda-adaptive-write-request-scheduling)