CONCEPT Cited by 1 source
Garbage collection (storage)¶
Definition¶
In immutable / append-only storage systems, garbage collection (GC) is the stage that identifies which blobs, rows, or objects are no longer referenced and marks them safe to remove. GC does not free disk space on its own — that's the job of compaction.
The separation is explicit in Magic Pocket's framing:
Garbage collection identifies blobs that are no longer referenced and marks them as safe to remove, but it does not free space on its own. Compaction performs the physical reclamation. Because volumes cannot be modified once closed, we gather the live blobs from volumes, write them into new volumes, and retire the old ones. This is how deletes eventually translate into reusable space. (Source: sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction)
Why the two-stage split¶
- Reference tracking (GC) is a metadata-layer question — often requires traversing catalog state, user-facing reference tables, retention policies, snapshot holds, etc. Slow, correctness-critical, happens asynchronously.
- Physical reclamation (compaction) is a storage-layer question — rewrite live data, retire the old unit. Expensive in I/O, throttled, batched.
Coupling them would force every delete to pay the full reclamation cost synchronously, which is infeasible at the delete rates real systems see (Magic Pocket: millions per day). Decoupling lets both stages be tuned independently:
- GC cadence tracks reference-graph staleness tolerance.
- Compaction cadence tracks overhead tolerance.
Canonical instantiations¶
- Magic Pocket — GC marks blobs deleted; compaction (L1/L2/L3) physically reclaims volumes once enough marked-deleted blobs accumulate.
- LSM databases — tombstones mark deleted keys; compaction drops them during merges.
- Open table formats (Iceberg, Delta Lake) — manifest GC invalidates old Parquet files; physical delete is a separate managed-compaction step.
Seen in¶
- sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction — canonical two-stage statement (GC marks, compaction frees).