CONCEPT Cited by 3 sources
Write amplification¶
Definition¶
Write amplification (WA) is the ratio of physical bytes written to storage to logical bytes the application intended to write.
Forces that push WA above 1:
- LSM compaction: every byte rewritten
~log(N) / Btimes as it merges up tiers / levels (concepts/lsm-compaction). - SSD page-level rewrites: an in-place update of a byte forces rewriting the whole flash page.
- Replication / erasure coding: a single logical write becomes
N replicas or
(k + m)shards. - Immutable-store compaction: delete → live-blob rewrite during compaction to free the donor (concepts/storage-overhead-fragmentation).
- Metadata: header / index / WAL updates around every logical write.
Any one of these can dominate; at scale, the WA term is often the first-order capacity / IOPS multiplier.
Magic Pocket — background-write WA as the Live Coder's design target¶
The immediate context from the 2026-04-02 post:
Last year, we rolled out a new service that changed how data is placed across Magic Pocket. The change reduced write amplification for background writes, so each write triggered fewer backend storage operations. (Source: sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction)
Before the Live Coder service, writes went through a replicated path first and were re-encoded into erasure-coded volumes as a background operation — that re-encode is itself a rewrite, so each logical byte written produced multiple physical bytes.
The Live Coder path writes data directly into erasure-coded volumes, skipping the replicated-then-re-encoded intermediate. Background writes fan out to fewer backend storage operations per logical byte.
The unintended consequence was a different cost axis — the new path produced severely-under-filled volumes, driving storage overhead up. The fix was multi-strategy compaction (L1 + L2 + L3) over the new distribution, with L3 itself reusing the Live Coder as a re-encoding pipeline. That latter choice is a deliberate trade: L3's rewrite WA is high on a per-blob basis (every live blob re-encoded gets a new identity + metadata entry), but applied only to the sparse tail, keeping absolute per-reclaimed-volume rewrite work low.
Multi-index / multi-namespace WA — Netflix Graph Abstraction¶
A different shape of write amplification appears at the logical rather than physical level: a single client write fans out across multiple indexes or namespaces, each requiring its own substrate write.
Netflix Graph Abstraction canonicalises this in the Part-I post:
"A single write on the fronting service can result in multiple writes to the backing durable storage due to the use of multiple indexes."
A single edge write fans out to 3+ KV records in different namespaces:
| Substrate write | Why |
|---|---|
| Forward link namespace | adjacency-list entry at partition source |
| Reverse link namespace | adjacency-list entry at partition target |
| Edge property namespace | property bag at the lex-sorted-concat ID |
| (Optional) write-aside cache | if cached link record needs refresh |
Mitigations specific to this shape — distinct from LSM / erasure-coding mitigations:
- Write-aside cache for edge links — suppresses redundant writes when a link already exists; the dominant lever for graph WA reduction.
- Edge links/ properties split — keeps each link record small, so write doubling on forward + reverse remains affordable.
- Per-namespace cache invalidation strategy choice — the invalidate-on-write mode pushes additional cache writes; the TTL-driven mode trades staleness for lower write rate (patterns/read-aside-cache-with-dual-invalidation).
Multi-namespace WA is structurally different from LSM WA: the multiplier is logical (number of indexes / records per operation), independent of the substrate's compaction behaviour, and stacks on top of any LSM / replication WA the substrate adds.
Axis summary¶
| Source of WA | Canonical system | Shape |
|---|---|---|
| LSM merging | Husky, RocksDB | Amortized ~log(N) / B rewrites per byte |
| SSD erase-block | Any flash-backed store | Page-level-rewrite on sub-page update |
| Replication / EC | Magic Pocket, S3 | (k + m) / k or N / 1 multiplier |
| Compaction in immutable stores | Magic Pocket, Husky | Live bytes rewritten once per reclaim of their containing unit |
| Write-path layers | Replicated-then-re-encoded writes | One rewrite per intermediate tier |
| Multi-namespace / multi-index | Netflix Graph Abstraction | 3+ logical writes per edge mutation |
Design knobs that move WA¶
- Choose a write path that skips a rewrite tier — Magic Pocket's Live Coder → direct EC writes (removed one rewrite); inline EC in S3 ShardStore.
- Size reclaimable units to the workload — bigger units reduce per-byte compaction WA but increase reclamation latency.
- Delay compaction until efficient-enough — Husky's lazy compaction is an order of magnitude cheaper than eager size-tiering.
- Pick reclamation mechanism per segment — L1/L2 keep blobs under the same volume identity (low metadata WA); L3 rewrites into new volumes (high per-blob metadata WA, low per-reclaimed- volume total).
Relation to overhead¶
WA and storage overhead are orthogonal cost axes:
- WA is a per-write rewrite cost — pays in I/O, compute, flash wear, CPU, metadata writes.
- Overhead is a capacity-holding cost — pays in hardware fleet size.
The two interact at the compaction layer: compaction causes WA (rewrites live data) in order to reduce overhead. Picking a compaction strategy picks a trade between the two.
Seen in¶
- sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction — explicit "reduced write amplification for background writes" as the Live Coder's original design goal; compaction's rewrite WA as the cost side of the reclamation trade.
- sources/2024-03-06-highscalability-behind-aws-s3s-massive-scale — Kozlovski names ShardStore's shard-data-stored-outside-the-tree design choice as an explicit write-amplification optimization: the LSM tree moves small metadata entries during compaction, not the large shard payloads that would otherwise rewrite-amplify proportional to compaction frequency.
- sources/2026-06-01-databricks-debunking-8-data-layout-myths-why-liquid-clustering-outperfo — Names Z-Ordering's "unnecessary rewrites" as an explicit write-amplification failure mode: "each rerun rewrites large amounts of old, possibly already- clustered data to restore clustering quality. With continuous ingestion, the cost of keeping data well-clustered with Z-Order grows along with the table." The cost geometry is structurally pathological — rewrite cost grows with table size, not with new-data volume, so per-cycle cost diverges as the table grows. Incremental clustering on write is the canonicalised mitigation pattern: bound maintenance cost to incremental work (proportional to new data) rather than table state (proportional to table size). This makes Z-Order's structural critique a wiki-canonical instance of write amplification at the lakehouse table-layout altitude — distinct from but architecturally adjacent to the LSM / replication / SSD instances above.