CONCEPT Cited by 3 sources

Write amplification¶

Definition¶

Write amplification (WA) is the ratio of physical bytes written to storage to logical bytes the application intended to write.

WA = (bytes written to disk) / (bytes written by application)

Forces that push WA above 1:

LSM compaction: every byte rewritten ~log(N) / B times as it merges up tiers / levels (concepts/lsm-compaction).
SSD page-level rewrites: an in-place update of a byte forces rewriting the whole flash page.
Replication / erasure coding: a single logical write becomes N replicas or (k + m) shards.
Immutable-store compaction: delete → live-blob rewrite during compaction to free the donor (concepts/storage-overhead-fragmentation).
Metadata: header / index / WAL updates around every logical write.

Any one of these can dominate; at scale, the WA term is often the first-order capacity / IOPS multiplier.

Magic Pocket — background-write WA as the Live Coder's design target¶

The immediate context from the 2026-04-02 post:

Last year, we rolled out a new service that changed how data is placed across Magic Pocket. The change reduced write amplification for background writes, so each write triggered fewer backend storage operations. (Source: sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction)

Before the Live Coder service, writes went through a replicated path first and were re-encoded into erasure-coded volumes as a background operation — that re-encode is itself a rewrite, so each logical byte written produced multiple physical bytes.

The Live Coder path writes data directly into erasure-coded volumes, skipping the replicated-then-re-encoded intermediate. Background writes fan out to fewer backend storage operations per logical byte.

The unintended consequence was a different cost axis — the new path produced severely-under-filled volumes, driving storage overhead up. The fix was multi-strategy compaction (L1 + L2 + L3) over the new distribution, with L3 itself reusing the Live Coder as a re-encoding pipeline. That latter choice is a deliberate trade: L3's rewrite WA is high on a per-blob basis (every live blob re-encoded gets a new identity + metadata entry), but applied only to the sparse tail, keeping absolute per-reclaimed-volume rewrite work low.

Multi-index / multi-namespace WA — Netflix Graph Abstraction¶

A different shape of write amplification appears at the logical rather than physical level: a single client write fans out across multiple indexes or namespaces, each requiring its own substrate write.

Netflix Graph Abstraction canonicalises this in the Part-I post:

"A single write on the fronting service can result in multiple writes to the backing durable storage due to the use of multiple indexes."

A single edge write fans out to 3+ KV records in different namespaces:

Substrate write	Why
Forward link namespace	adjacency-list entry at partition `source`
Reverse link namespace	adjacency-list entry at partition `target`
Edge property namespace	property bag at the lex-sorted-concat ID
(Optional) write-aside cache	if cached link record needs refresh

Mitigations specific to this shape — distinct from LSM / erasure-coding mitigations:

Write-aside cache for edge links — suppresses redundant writes when a link already exists; the dominant lever for graph WA reduction.
Edge links/ properties split — keeps each link record small, so write doubling on forward + reverse remains affordable.
Per-namespace cache invalidation strategy choice — the invalidate-on-write mode pushes additional cache writes; the TTL-driven mode trades staleness for lower write rate (patterns/read-aside-cache-with-dual-invalidation).

Multi-namespace WA is structurally different from LSM WA: the multiplier is logical (number of indexes / records per operation), independent of the substrate's compaction behaviour, and stacks on top of any LSM / replication WA the substrate adds.

Axis summary¶

Source of WA	Canonical system	Shape
LSM merging	Husky, RocksDB	Amortized `~log(N) / B` rewrites per byte
SSD erase-block	Any flash-backed store	Page-level-rewrite on sub-page update
Replication / EC	Magic Pocket, S3	`(k + m) / k` or `N / 1` multiplier
Compaction in immutable stores	Magic Pocket, Husky	Live bytes rewritten once per reclaim of their containing unit
Write-path layers	Replicated-then-re-encoded writes	One rewrite per intermediate tier
Multi-namespace / multi-index	Netflix Graph Abstraction	3+ logical writes per edge mutation

Design knobs that move WA¶

Choose a write path that skips a rewrite tier — Magic Pocket's Live Coder → direct EC writes (removed one rewrite); inline EC in S3 ShardStore.
Size reclaimable units to the workload — bigger units reduce per-byte compaction WA but increase reclamation latency.
Delay compaction until efficient-enough — Husky's lazy compaction is an order of magnitude cheaper than eager size-tiering.
Pick reclamation mechanism per segment — L1/L2 keep blobs under the same volume identity (low metadata WA); L3 rewrites into new volumes (high per-blob metadata WA, low per-reclaimed- volume total).

Relation to overhead¶

WA and storage overhead are orthogonal cost axes:

WA is a per-write rewrite cost — pays in I/O, compute, flash wear, CPU, metadata writes.
Overhead is a capacity-holding cost — pays in hardware fleet size.

The two interact at the compaction layer: compaction causes WA (rewrites live data) in order to reduce overhead. Picking a compaction strategy picks a trade between the two.

Seen in¶

sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction — explicit "reduced write amplification for background writes" as the Live Coder's original design goal; compaction's rewrite WA as the cost side of the reclamation trade.
sources/2024-03-06-highscalability-behind-aws-s3s-massive-scale — Kozlovski names ShardStore's shard-data-stored-outside-the-tree design choice as an explicit write-amplification optimization: the LSM tree moves small metadata entries during compaction, not the large shard payloads that would otherwise rewrite-amplify proportional to compaction frequency.
sources/2026-06-01-databricks-debunking-8-data-layout-myths-why-liquid-clustering-outperfo — Names Z-Ordering's "unnecessary rewrites" as an explicit write-amplification failure mode: "each rerun rewrites large amounts of old, possibly already- clustered data to restore clustering quality. With continuous ingestion, the cost of keeping data well-clustered with Z-Order grows along with the table." The cost geometry is structurally pathological — rewrite cost grows with table size, not with new-data volume, so per-cycle cost diverges as the table grows. Incremental clustering on write is the canonicalised mitigation pattern: bound maintenance cost to incremental work (proportional to new data) rather than table state (proportional to table size). This makes Z-Order's structural critique a wiki-canonical instance of write amplification at the lakehouse table-layout altitude — distinct from but architecturally adjacent to the LSM / replication / SSD instances above.