Skip to content

CONCEPT Cited by 3 sources

Write amplification

Definition

Write amplification (WA) is the ratio of physical bytes written to storage to logical bytes the application intended to write.

WA = (bytes written to disk) / (bytes written by application)

Forces that push WA above 1:

  • LSM compaction: every byte rewritten ~log(N) / B times as it merges up tiers / levels (concepts/lsm-compaction).
  • SSD page-level rewrites: an in-place update of a byte forces rewriting the whole flash page.
  • Replication / erasure coding: a single logical write becomes N replicas or (k + m) shards.
  • Immutable-store compaction: delete → live-blob rewrite during compaction to free the donor (concepts/storage-overhead-fragmentation).
  • Metadata: header / index / WAL updates around every logical write.

Any one of these can dominate; at scale, the WA term is often the first-order capacity / IOPS multiplier.

Magic Pocket — background-write WA as the Live Coder's design target

The immediate context from the 2026-04-02 post:

Last year, we rolled out a new service that changed how data is placed across Magic Pocket. The change reduced write amplification for background writes, so each write triggered fewer backend storage operations. (Source: sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction)

Before the Live Coder service, writes went through a replicated path first and were re-encoded into erasure-coded volumes as a background operation — that re-encode is itself a rewrite, so each logical byte written produced multiple physical bytes.

The Live Coder path writes data directly into erasure-coded volumes, skipping the replicated-then-re-encoded intermediate. Background writes fan out to fewer backend storage operations per logical byte.

The unintended consequence was a different cost axis — the new path produced severely-under-filled volumes, driving storage overhead up. The fix was multi-strategy compaction (L1 + L2 + L3) over the new distribution, with L3 itself reusing the Live Coder as a re-encoding pipeline. That latter choice is a deliberate trade: L3's rewrite WA is high on a per-blob basis (every live blob re-encoded gets a new identity + metadata entry), but applied only to the sparse tail, keeping absolute per-reclaimed-volume rewrite work low.

Multi-index / multi-namespace WA — Netflix Graph Abstraction

A different shape of write amplification appears at the logical rather than physical level: a single client write fans out across multiple indexes or namespaces, each requiring its own substrate write.

Netflix Graph Abstraction canonicalises this in the Part-I post:

"A single write on the fronting service can result in multiple writes to the backing durable storage due to the use of multiple indexes."

A single edge write fans out to 3+ KV records in different namespaces:

Substrate write Why
Forward link namespace adjacency-list entry at partition source
Reverse link namespace adjacency-list entry at partition target
Edge property namespace property bag at the lex-sorted-concat ID
(Optional) write-aside cache if cached link record needs refresh

Mitigations specific to this shape — distinct from LSM / erasure-coding mitigations:

Multi-namespace WA is structurally different from LSM WA: the multiplier is logical (number of indexes / records per operation), independent of the substrate's compaction behaviour, and stacks on top of any LSM / replication WA the substrate adds.

Axis summary

Source of WA Canonical system Shape
LSM merging Husky, RocksDB Amortized ~log(N) / B rewrites per byte
SSD erase-block Any flash-backed store Page-level-rewrite on sub-page update
Replication / EC Magic Pocket, S3 (k + m) / k or N / 1 multiplier
Compaction in immutable stores Magic Pocket, Husky Live bytes rewritten once per reclaim of their containing unit
Write-path layers Replicated-then-re-encoded writes One rewrite per intermediate tier
Multi-namespace / multi-index Netflix Graph Abstraction 3+ logical writes per edge mutation

Design knobs that move WA

  • Choose a write path that skips a rewrite tier — Magic Pocket's Live Coder → direct EC writes (removed one rewrite); inline EC in S3 ShardStore.
  • Size reclaimable units to the workload — bigger units reduce per-byte compaction WA but increase reclamation latency.
  • Delay compaction until efficient-enough — Husky's lazy compaction is an order of magnitude cheaper than eager size-tiering.
  • Pick reclamation mechanism per segment — L1/L2 keep blobs under the same volume identity (low metadata WA); L3 rewrites into new volumes (high per-blob metadata WA, low per-reclaimed- volume total).

Relation to overhead

WA and storage overhead are orthogonal cost axes:

  • WA is a per-write rewrite cost — pays in I/O, compute, flash wear, CPU, metadata writes.
  • Overhead is a capacity-holding cost — pays in hardware fleet size.

The two interact at the compaction layer: compaction causes WA (rewrites live data) in order to reduce overhead. Picking a compaction strategy picks a trade between the two.

Seen in

  • sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction — explicit "reduced write amplification for background writes" as the Live Coder's original design goal; compaction's rewrite WA as the cost side of the reclamation trade.
  • sources/2024-03-06-highscalability-behind-aws-s3s-massive-scale — Kozlovski names ShardStore's shard-data-stored-outside-the-tree design choice as an explicit write-amplification optimization: the LSM tree moves small metadata entries during compaction, not the large shard payloads that would otherwise rewrite-amplify proportional to compaction frequency.
  • sources/2026-06-01-databricks-debunking-8-data-layout-myths-why-liquid-clustering-outperfo — Names Z-Ordering's "unnecessary rewrites" as an explicit write-amplification failure mode: "each rerun rewrites large amounts of old, possibly already- clustered data to restore clustering quality. With continuous ingestion, the cost of keeping data well-clustered with Z-Order grows along with the table." The cost geometry is structurally pathological — rewrite cost grows with table size, not with new-data volume, so per-cycle cost diverges as the table grows. Incremental clustering on write is the canonicalised mitigation pattern: bound maintenance cost to incremental work (proportional to new data) rather than table state (proportional to table size). This makes Z-Order's structural critique a wiki-canonical instance of write amplification at the lakehouse table-layout altitude — distinct from but architecturally adjacent to the LSM / replication / SSD instances above.
Last updated · 542 distilled / 1,571 read