CONCEPT Cited by 2 sources
Erasure coding¶
Definition¶
Erasure coding is a redundancy scheme that encodes data into more pieces than are needed to read it, so that the data survives the loss of any fixed number of pieces. Unlike replication (N identical copies), erasure coding spends less capacity per unit of fault tolerance.
The (k, m) shape used in S3¶
Warfield's description:
We use an algorithm, such as Reed-Solomon, and split our object into a set of k "identity" shards. Then we generate an additional set of m parity shards. As long as k of the (k + m) total shards remain available, we can read the object. This approach lets us reduce capacity overhead while surviving the same number of failures.
- Identity shards: raw pieces of the object.
- Parity shards: computed from the identity shards via Reed-Solomon (or similar MDS code).
- Reconstruction: any k of the k+m suffices.
- Tolerance: can lose m shards without data loss.
- Capacity overhead: (k+m)/k − 1 (e.g., k=10, m=4 → 40% overhead vs. 200% for 3× replication of comparable tolerance).
Replication vs. erasure coding¶
| Replication (3×) | Erasure coding (e.g., 10+4) | |
|---|---|---|
| Capacity overhead | 200% | 40% |
| Reads per logical read | 1 drive | k drives (e.g., 10) |
| Reconstruction cost on loss | Copy 1 replica | Read k, recompute |
| Placement flexibility | Read from any of 3 | Read from any k of 14 |
For capacity-dominated stores (S3), erasure coding is a clear cost win. The tradeoff is more drives touched per read, which is part of why concepts/tail-latency-at-scale bites harder on EC stores: any of the k drives being hot stalls the reconstruct.
Erasure coding as a heat-management tool¶
The post emphasizes a subtler use:
Redundancy schemes let us divide our data into more pieces than we need to read in order to access it, and that in turn provides us with the flexibility to avoid sending requests to overloaded disks.
If you need k shards out of k+m, you have m extra degrees of freedom per read — the frontend can choose which k to fetch and so route around hot drives. This makes EC not merely a durability scheme but a component of concepts/heat-management. See also patterns/redundancy-for-heat.
Consequences¶
- Reconstruct amplification on tail. If any one of k shards queues, the whole read queues (see concepts/tail-latency-at-scale). Mitigated by requesting k+ε shards (hedging) or by better placement.
- Metadata lookup on the hot path. To pick the k shards you want, you must first know where all k+m live. That lookup itself becomes a layer whose variance matters.
- Small objects' tax. Erasure coding has fixed-cost overheads per shard; for tiny objects, replication is often still the right choice.
EC as re-encoding substrate (Magic Pocket)¶
Dropbox's 2026-04 storage-efficiency post reframes EC more broadly: "Compaction is, in effect, a constrained form of re-encoding: take live data from one set of volumes and produce a new, durable volume."
This observation underwrites patterns/streaming-re-encoding-reclamation — the L3 compaction strategy inside Magic Pocket re-uses the same Live Coder erasure-coder that caused its fragmentation incident as a streaming reclamation pipeline. Live blobs from severely-under-filled volumes stream into the encoder, which accumulates and emits new near-full EC volumes over time. Each source volume is reclaimed immediately once drained.
Trade: per-blob metadata cost is high (every reclaimed blob gets a new volume identity → new Panda location entry). But applied only to the sparse tail, total per-reclaimed-volume rewrite work stays bounded, and the EC pipeline is throughput-oriented by design. An EC encoder, once built, is a dual-purpose primitive — fresh-write path and reclamation path share the same code.
(Source: sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction)
Seen in¶
- sources/2025-02-25-allthingsdistributed-building-and-operating-s3 — Warfield's Reed-Solomon example, and the framing of EC as both durability and heat-steering.
- sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction — EC as a re-encoding substrate reusable in the compaction path; Magic Pocket's L3 strategy streams sparse-volume live blobs into the Live Coder erasure-coder as reclamation.