PATTERN Cited by 1 source
Streaming re-encoding reclamation¶
Pattern¶
Use an existing on-the-fly encoder (typically an erasure-coder) as a streaming reclamation pipeline. Live data from severely-under-filled source units is fed continuously into the encoder, which accumulates and emits new, durable destination units over time. Each source unit is reclaimed immediately once drained.
Contrasts with bounded-batch packing compaction (L2-style DP packing), which picks a fixed set of sources that nearly fill one destination in one shot. Streaming re-encoding decouples source-drain timing from destination-emission timing: destinations appear whenever the encoder's accumulated input hits a full unit.
Why it fits the sparse tail¶
On the sparse end of the fill-level distribution (e.g. <10% live data per volume), bounded-batch DP packing is inefficient:
- Per-run destination is a small improvement (one new volume at best).
- Per-run source selection pays DP cost, but reclaim-per-unit-work is low when all candidates are nearly empty.
- Metadata pressure is high per reclaimed byte regardless of strategy.
Streaming re-encoding instead:
- Per reclaimed source volume, few bytes rewritten — sparse volumes have little live data by construction.
- No up-front packing decision — the encoder accumulates continuously and emits when full.
- Reclamation tracks input rate, not planner cadence; adding more sparse sources just feeds the pipeline faster.
The trade: every live blob goes into a new volume with a new identity, so every blob requires a metadata location update. Bounded-batch approaches that top off a host volume or pack under the same volume identity have much lower metadata cost per blob.
Canonical realization — Magic Pocket L3 (Dropbox, 2026-04)¶
- Reused component: the Live Coder service — originally written to erasure-code writes directly into EC volumes, bypassing the initial replicated write path.
- Repurposed role: fed continuously with live blobs drained from sparse source volumes. Accumulates and encodes over time; emits new volumes.
- Reclaim timing: each source volume reclaimed immediately after its live data is drained.
- Role in the strategy stack: the sparse-tail sibling of L1 + L2 + L3; L3 is the mechanism here, patterns/multi-strategy-compaction is the orchestration that keeps L1 / L2 / L3 from stepping on each other.
Explicit framing from the post: "Compaction is, in effect, a constrained form of re-encoding: take live data from one set of volumes and produce a new, durable volume." The streaming variant is that re-encoding, with accumulation done inside the encoder.
(Source: sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction)
Structural ingredients¶
- An existing on-the-fly encoder with an input stream and durable output — already owned, tuned for throughput.
- A drain path from the storage layer into the encoder — pulls live blobs from sparse source units.
- Emit-on-full: encoder decides when to emit (enough accumulated input to fill a destination unit at target fill level).
- Immediate source reclaim: once a source unit is drained, the storage system can re-use its allocation without waiting for the next emit.
- Metadata-aware rate limiting: because each blob rewrite is a new identity → new metadata write, the metadata system's write budget is the binding constraint, not storage I/O.
Trade-offs vs bounded-batch packing¶
| Streaming re-encoding | Bounded-batch DP packing | |
|---|---|---|
| Best for | Sparse tail | Middle of fill distribution |
| Per-reclaimed-source rewrite cost | Low | Low–moderate |
| Per-blob metadata cost | High (new volume identity every blob) | Low (donor blobs only) |
| Planner cost | None (streaming) | Per-run DP (bounded by granularity + max-volumes cap) |
| Reclaim cadence | Continuous | Per planner run |
| Destination emission | When encoder accumulates full unit | One new volume per run |
The two are complementary: run them concurrently over disjoint fill-level ranges (patterns/multi-strategy-compaction), and route each source to the strategy whose cost profile matches its sparsity.
Failure modes¶
- Metadata overload: L3-style streaming rewrites dominate the metadata system's write budget; mitigation is per-path rate limit + routing only the sparsest tail through the pipeline.
- Encoder backpressure: if the encoder can't keep up, the drain queue grows; mitigation is flow-control from encoder to drain path + observability on queue depth.
- Destination-unit fill quality: if the encoder emits on a pure time budget it can produce under-filled destinations of its own; the emit policy should be fill-driven, not time-driven.
- Cross-DC bandwidth: if the encoder isn't cell-local, the stream consumes cross-DC traffic — keep the encoder in the same failure domain as the source volumes.
Seen in¶
- sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction — L3 as a streaming pipeline into the Live Coder encoder for the sparsest tail of the fill-level distribution.