PATTERN Cited by 1 source

Pull-based compaction scheduling¶

Context¶

In Redpanda's shared-nothing architecture, partitions are pinned to specific cores. This means two heavily-compactible partitions may land on the same shard. A naive push-based scheduler would cause those shards to starve, filling disks while other shards idle.

Solution¶

Cloud Topics decouples compaction scheduling from partition placement via a pull-based model:

A compaction scheduler (running on shard 0 of each compaction node) maintains a priority queue of partitions eligible for compaction.
The priority heuristic is based on Kafka-native concepts:
Dirty ratio — bytes dirty / bytes total vs min.cleanable.dirty.ratio
Time elapsed — time since data became eligible vs max.compaction.lag.ms
Compaction workers run on every shard of every compaction node.
Each worker polls the scheduler for the next-highest-priority partition, executes the compaction pipeline, and uploads results.

Because compaction operates on object storage data + metastore metadata (not local replica state), any shard on any node can compact any partition — work distributes freely across the cluster regardless of leadership or replica placement.

Consequences¶

Eliminates compaction starvation from co-located hot partitions
Decouples compaction CPU budget from produce/consume path
Enables horizontal scaling of compaction throughput by adding nodes
Any broker (not just partition leaders) can contribute compaction capacity

Seen in¶

sources/2026-06-30-redpanda-cloud-topics-compaction