Skip to content

CONCEPT Cited by 1 source

Storage overhead and fragmentation

Definition

Storage overhead is the ratio of raw capacity consumed to live data stored. In an immutable / append-only substrate, overhead is driven by two separable forces:

  1. Durability redundancy — replication factor or erasure-coding (k, m) shape. Chosen once, fleet-wide, traded against fault tolerance. Usually stable over long horizons.
  2. Fragmentation — the per-reclaimable-unit fill factor. How full are the units (SSTables, volumes, extents, blob batches) that allocation is sized by? A half-full unit consumes the same disk allocation as a full one.

Fragmentation is the load-bearing term when durability is stable. Dropbox's framing:

A useful way to think about this is what percentage of a volume that contains active data. If a volume is half full of live data, we are effectively using twice the storage needed for that data. If only ten percent is live, we are using about ten times the space required. (Source: sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction)

Why fragmentation is structural in immutable stores

In an in-place-mutable filesystem, a delete frees the corresponding bytes immediately. In an immutable store:

  • Writes produce new blobs / objects / rows; old ones stay on disk until explicitly reclaimed.
  • Reclamation can't just zero a range; it has to rewrite remaining live data elsewhere and retire the old unit — the compaction contract in LSM DBs, in Magic Pocket's volume-level compaction, in columnar object-store warehouses like Husky.
  • Any delay between deletes accumulating and compaction running grows fragmentation monotonically.

The split:

  • Garbage collection identifies which blobs are no longer referenced. Marking, not freeing.
  • Compaction physically reclaims: read live blobs from stale units, write them into fresh units, retire the stale ones.

(Source: sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction)

Fill-level distribution as the first-class operational metric

Overhead aggregates the per-unit fill factor across the whole fleet. What drives the aggregate is the shape of the fill-level distribution:

  • Steady state at Magic Pocket: most volumes already highly filled; a small number lightly filled as deletes trickle in. Compaction's job is to continuously consolidate the small tail.
  • After distribution shift (Magic Pocket's Live Coder incident): a long tail of severely under-filled volumes (<5% live data in the worst cases). Same fleet-wide overhead metric, very different distributional shape — and a very different compaction workload.

The same fleet-wide overhead number can hide completely different distributional shapes. Magic Pocket explicitly started tracking:

  • Live Coder output rate
  • Volume fill distribution (not just mean)
  • Week-over-week overhead change

— to catch distribution shifts before overhead rises too far.

Why overhead matters at exabyte scale

Small overhead percentages translate directly into hardware purchases. Quote:

Storage overhead directly determines how much raw capacity we need in order to store the same amount of live user data. Even small changes in overhead materially affect hardware purchases and fleet growth.

For reference the pre-incident Magic Pocket fleet is:

  • Tens of thousands of servers
  • Millions of drives
  • Exabyte scale
  • 99% SMR, 30+ TB/drive on the 7th-gen Sonic platform

A fraction of a percentage-point overhead delta maps to meaningful capex.

The compaction-strategy dependency on distribution

A compaction strategy embeds an implicit assumption about the fill-level distribution it's reclaiming from:

  • L1-style host-plus-donor packing (Magic Pocket's baseline) assumes most volumes are already highly filled — its job is to top off a high-fill host from a small number of donor volumes. Scales arithmetically to the number of available full hosts; not the number of under-filled donors.
  • L2-style DP packing consolidates moderately under-filled volumes into a brand-new near-full destination; assumes the middle of the distribution is well-populated with combinable-under-filled candidates.
  • L3-style streaming re-encode assumes a long tail of very-sparse volumes whose live data is cheap to rewrite per reclaimed unit.

When the workload shifts the distribution, the steady-state strategy may stay correct but become too slow. The multi-strategy compaction response covers all three ranges concurrently with disjoint eligibility boundaries.

Interaction with other overhead drivers

  • Durability (replication vs EC) sets the floor. A 3× replication floor is 200% overhead; (10+4) EC is 40% — fragmentation sits on top of whichever base is chosen.
  • Heat / placement (concepts/heat-management): placement tactics re-distribute reads, not storage cost. Orthogonal to overhead.
  • SMR / drive density: raises capacity per drive but the random-write penalty pushes workloads toward append-only substrates, which raises the fragmentation pressure that compaction must then fight.

Seen in

Last updated · 200 distilled / 1,178 read