Max-min fairness storage fair-share¶

CONCEPT Cited by 1 source

Max-min fairness storage fair-share is the allocation policy in which a shared multi-tenant storage budget is divided among tenants such that the smallest allocation is maximised — every tenant gets at least its fair share, and any unused fair-share is redistributed to tenants who need more. Applied to multi-tenant analytics or shared- infrastructure storage, it lets the operator run the underlying disks at a high target utilisation (e.g. 90 %) without per-tenant manual quota tuning.

The term max-min fairness comes from network bandwidth allocation literature, where it is a canonical primitive for fair scheduling. The same algorithm transfers cleanly to disk- space allocation across namespaces, queues, or tenants.

The allocation rule¶

Given:

N tenants, each with a demand d_i (how much storage they would use if unconstrained).
A total capacity C.

The max-min-fair allocation a_i for each tenant is computed iteratively:

Compute the equal share s = C / N.
Tenants with d_i ≤ s get a_i = d_i and release s - d_i back to the pool.
The released capacity is divided equally among the tenants with d_i > s, raising their share.
Repeat with the remaining tenants and remaining capacity until all demands are satisfied or all remaining tenants are at their new equal share.

Net effect:

Tenants with low demand get exactly what they ask for.
Tenants with high demand share the remaining capacity equally (subject to further iterations if some are still below the new share).
No tenant is starved: every tenant gets at least the smallest equal share over its needy peers.
No capacity is wasted: total allocation always sums to C (modulo demand < C).

Cloudflare's instance¶

Cloudflare's Ready-Analytics runs its ClickHouse clusters at a target disk utilisation of 90 %, with max-min-fairness as the allocation primitive across namespaces:

"Using the max-min fairness algorithm, we could set a target disk utilization (e.g., 90%) and automatically 'share' available space. Namespaces using less than their fair share would cede their unused capacity to those that needed more. This allowed us to confidently run our clusters at 90% utilization."

(Source: sources/2026-05-14-cloudflare-clickhouse-query-plan-contention)

The architectural prerequisite for this is the per-tenant retention via partitioning key primitive (see concepts/per-tenant-retention-via-partitioning-key): because each namespace's data lives in its own set of partitions, and partitions are the unit of retention, a per-namespace fair-share computation can map directly onto "this namespace's oldest partitions can stay" vs "this namespace's oldest partitions must be dropped" decisions during retention runs. Without per-tenant partition isolation, the fair-share decision can't be enforced because every tenant's data shares the same partition's lifecycle.

Why 90 % utilisation is high¶

Most production storage systems run at substantially below their full capacity — 70–80 % is a common target, sometimes 50–60 % when failure scenarios require headroom for re-replication. Running at 90 % typically requires:

Fast retention enforcement so that growing tenants can be slowed before pushing the cluster over.
Per-tenant isolation of the retention decision so that "shrink namespace A first" can be expressed without affecting namespaces B, C, D.
Burst-absorbing fairness so that short-term tenant spikes don't trigger emergency action — the fair-share allocator just temporarily contracts everyone's share.
Confidence in the operator-controlled architecture — Cloudflare's "confidently" qualifier signals that the team has audited the failure modes and is comfortable the policy holds under burst.

Together these are why max-min-fairness pairs naturally with per-tenant partition isolation: the architectural primitive lets the policy be expressed without operator-in-the-loop intervention.

Adjacent at other altitudes¶

Network bandwidth scheduling — the original setting for max-min-fairness; canonical example is per-flow fair queuing in routers (Stochastic Fair Queueing, Deficit Round Robin).
CPU scheduling — the Linux Completely Fair Scheduler (CFS) approximates max-min-fairness over CPU time across processes.
Kubernetes resource quotas — LimitRange + ResourceQuota provide a fair-share-adjacent primitive per namespace, although enforcement is hard-cap rather than redistributive.
Cloud-provider per-account quotas with auto-increase — AWS / GCP / Azure soft-quotas approximate max-min- fairness across customers when the underlying capacity is shared, with manual operator intervention as the redistributive lever.

Failure modes¶

Demand prediction error — max-min-fair allocation is computed against predicted per-tenant demand. If a tenant's actual demand grows faster than the prediction (e.g., a feature launch), the fair-share contract gets violated until the next iteration.
Slow retention — if retention enforcement is slower than tenant write rate, the cluster fills up regardless of the fair-share computation. Cloudflare's design has fast partition-drop retention precisely to keep this loop tight.
Tenant gaming — at high target utilisation, a tenant that knows the algorithm could theoretically "grow into unused fair-share" — but in single-organisation deployments (Cloudflare's Ready-Analytics is internal, not external SaaS) gaming is not the threat model.

Seen in¶

sources/2026-05-14-cloudflare-clickhouse-query-plan-contention — canonical wiki instance of max-min-fairness applied to ClickHouse cluster storage in Cloudflare's Ready-Analytics multi-tenant analytics platform. Names 90 % target utilisation, ceding-unused-capacity-to-needy-namespaces as the load-bearing mechanism, and the per-tenant partition isolation as the architectural prerequisite. "Allowed us to confidently run our clusters at 90% utilization."

systems/clickhouse — storage substrate.
systems/cloudflare-ready-analytics — canonical instance.
concepts/per-tenant-retention-via-partitioning-key — the architectural prerequisite that makes per-tenant fair-share enforceable.
concepts/tenant-onboarding-time — adjacent multi- tenant capacity-planning primitive.
companies/cloudflare — operator.