PATTERN Cited by 1 source
Redundancy for heat¶
Pattern¶
Treat replicas and erasure-coded shards as I/O-steering degrees of freedom, not only as durability mechanisms. Every read request has multiple valid sources; route each request to the least hot valid source. Redundancy becomes a heat-management tool.
Why this works¶
- Replication gives you N valid sources per logical read. You can read from any of them. Cost: capacity. Benefit: flexibility at read time.
- Erasure coding with (k, m) shards gives you the flexibility to pick any k of k+m shards per read. That's
C(k+m, k)valid shard combinations. Cost: more drives touched per read (k); Benefit: both capacity efficiency and steering.
Warfield (2025):
In storage systems, redundancy schemes are commonly used to protect data from hardware failures, but redundancy also helps manage heat. They spread load out and give you an opportunity to steer request traffic away from hotspots.
As an example, consider replication as a simple approach to encoding and protecting data. Replication protects data if disks fail by just having multiple copies on different disks. But it also gives you the freedom to read from any of the disks. When we think about replication from a capacity perspective it's expensive. However, from an I/O perspective — at least for reading data — replication is very efficient.
The S3 mix¶
S3 uses both techniques where each fits best. Warfield explicitly:
We obviously don't want to pay a replication overhead for all of the data that we store, so in S3 we also make use of erasure coding.
The pattern isn't "pick one"; it's "pick per tier / per workload, but treat redundancy schemes as dual-purpose from day one."
Consequences for system design¶
- Read scheduling has to know about heat. The frontend must choose per-read which subset of valid sources to hit. Requires telemetry from the storage fleet back to the frontend.
- Reconstruct amplification is a tax on erasure coding under hot conditions: all k selected shards must return before the read completes. If any one of k is hot, the read stalls. Mitigation: over-request (k + ε) and discard the extras, or choose the k drives more carefully.
- Metadata layer is on the hot path. To pick k out of k+m valid drives, the frontend must first resolve "where are all k+m shards?" That lookup itself is a queue layer.
- Writes have less flexibility than reads — you must write all the replicas / shards. Heat management on writes must fall back to patterns/data-placement-spreading at object-creation time.
Interactions¶
- With concepts/erasure-coding: the (k, m) shape directly determines how many steering degrees of freedom you have per read.
- With patterns/data-placement-spreading: spread gets objects onto disjoint drive sets at write time; redundancy-for-heat gives you read-time flexibility within those sets.
- With concepts/heat-management: this pattern is one of the primary levers heat management uses.
Seen in¶
- sources/2025-02-25-allthingsdistributed-building-and-operating-s3 — the S3 framing of replication and erasure coding as dual-purpose (durability + heat) mechanisms.