PATTERN Cited by 1 source

Streaming re-encoding reclamation¶

Pattern¶

Use an existing on-the-fly encoder (typically an erasure-coder) as a streaming reclamation pipeline. Live data from severely-under-filled source units is fed continuously into the encoder, which accumulates and emits new, durable destination units over time. Each source unit is reclaimed immediately once drained.

Contrasts with bounded-batch packing compaction (L2-style DP packing), which picks a fixed set of sources that nearly fill one destination in one shot. Streaming re-encoding decouples source-drain timing from destination-emission timing: destinations appear whenever the encoder's accumulated input hits a full unit.

Why it fits the sparse tail¶

On the sparse end of the fill-level distribution (e.g. <10% live data per volume), bounded-batch DP packing is inefficient:

Per-run destination is a small improvement (one new volume at best).
Per-run source selection pays DP cost, but reclaim-per-unit-work is low when all candidates are nearly empty.
Metadata pressure is high per reclaimed byte regardless of strategy.

Streaming re-encoding instead:

Per reclaimed source volume, few bytes rewritten — sparse volumes have little live data by construction.
No up-front packing decision — the encoder accumulates continuously and emits when full.
Reclamation tracks input rate, not planner cadence; adding more sparse sources just feeds the pipeline faster.

The trade: every live blob goes into a new volume with a new identity, so every blob requires a metadata location update. Bounded-batch approaches that top off a host volume or pack under the same volume identity have much lower metadata cost per blob.

Canonical realization — Magic Pocket L3 (Dropbox, 2026-04)¶

Reused component: the Live Coder service — originally written to erasure-code writes directly into EC volumes, bypassing the initial replicated write path.
Repurposed role: fed continuously with live blobs drained from sparse source volumes. Accumulates and encodes over time; emits new volumes.
Reclaim timing: each source volume reclaimed immediately after its live data is drained.
Role in the strategy stack: the sparse-tail sibling of L1 + L2 + L3; L3 is the mechanism here, patterns/multi-strategy-compaction is the orchestration that keeps L1 / L2 / L3 from stepping on each other.

Explicit framing from the post: "Compaction is, in effect, a constrained form of re-encoding: take live data from one set of volumes and produce a new, durable volume." The streaming variant is that re-encoding, with accumulation done inside the encoder.

(Source: sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction)

Structural ingredients¶

An existing on-the-fly encoder with an input stream and durable output — already owned, tuned for throughput.
A drain path from the storage layer into the encoder — pulls live blobs from sparse source units.
Emit-on-full: encoder decides when to emit (enough accumulated input to fill a destination unit at target fill level).
Immediate source reclaim: once a source unit is drained, the storage system can re-use its allocation without waiting for the next emit.
Metadata-aware rate limiting: because each blob rewrite is a new identity → new metadata write, the metadata system's write budget is the binding constraint, not storage I/O.

Trade-offs vs bounded-batch packing¶

	Streaming re-encoding	Bounded-batch DP packing
Best for	Sparse tail	Middle of fill distribution
Per-reclaimed-source rewrite cost	Low	Low–moderate
Per-blob metadata cost	High (new volume identity every blob)	Low (donor blobs only)
Planner cost	None (streaming)	Per-run DP (bounded by granularity + max-volumes cap)
Reclaim cadence	Continuous	Per planner run
Destination emission	When encoder accumulates full unit	One new volume per run

The two are complementary: run them concurrently over disjoint fill-level ranges (patterns/multi-strategy-compaction), and route each source to the strategy whose cost profile matches its sparsity.

Failure modes¶

Metadata overload: L3-style streaming rewrites dominate the metadata system's write budget; mitigation is per-path rate limit + routing only the sparsest tail through the pipeline.
Encoder backpressure: if the encoder can't keep up, the drain queue grows; mitigation is flow-control from encoder to drain path + observability on queue depth.
Destination-unit fill quality: if the encoder emits on a pure time budget it can produce under-filled destinations of its own; the emit policy should be fill-driven, not time-driven.
Cross-DC bandwidth: if the encoder isn't cell-local, the stream consumes cross-DC traffic — keep the encoder in the same failure domain as the source volumes.

Seen in¶

sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction — L3 as a streaming pipeline into the Live Coder encoder for the sparsest tail of the fill-level distribution.