Skip to content

CONCEPT Cited by 4 sources

Immutable Object Storage

Immutable object storage is a model in which the stored unit — the object — cannot be partially modified after it is written. Writes produce new versions (or new objects); reads see whole-object values; updates are whole-object replacements, not in-place mutations.

This is the low-level data model that S3 and most cloud object stores expose. "An HTTP-based storage system for immutable objects with four core verbs (PUT, GET, DELETE and LIST)" — Warfield.

(Source: sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes)

Why immutability

  • Simple replication and durability semantics. An object is either the old one or the new one; there is no partial-update window to reason about across replicas.
  • Easy versioning. Because overwrites are replacements, supporting "keep old versions" is a natural extension (S3 object versioning).
  • Low write-coordination cost. No need for distributed locks for sub-object ranges.
  • Good substrate for higher-level abstractions — see systems/apache-iceberg building a mutable "table" on top of immutable Parquet objects via snapshotting.

What immutability forces upward

Mutability still needed by many workloads is forced above the storage layer:

  • Row-level updates → concepts/open-table-format (Iceberg / Delta / Hudi) built over immutable Parquet.
  • Schema evolution → metadata layers that describe table state across many objects.
  • Atomic multi-writer coordination → conditional operations / CAS on object metadata (patterns/conditional-write) rather than in-place edits.
  • "Mutable" semantics in client libraries (e.g. embedded DB files on S3) → typically implemented as whole-file replace with conditional-write guards.

The tension the S3-at-19 post calls out

"Objects are simple and immutable, but tables are neither."

That single sentence is the whole argument for S3 Tables: once customers need table semantics over immutable objects, someone has to own the mutable-on-top-of-immutable glue (metadata, compaction, GC). Either the customer owns it via Iceberg client code, or the platform owns it via systems/s3-tables.

The file-semantics escape hatch (2026)

systems/s3-files introduces a second way out: instead of building mutability over immutable objects in client libraries, expose a filesystem presentation of the same S3 data via an NFS mount backed by EFS. File-layer mutations happen with full filesystem semantics (in-place writes, rename, append, mmap); the concepts/stage-and-commit mechanism batches and translates those mutations back to whole-object PUTs on the S3 side. Warfield's characterisation of the two worlds, side by side:

"Files are an operating system construct… Application APIs for files are built to support the idea that I can update a record in a database in place, or append data to a log, and that you can concurrently access that file and see my change almost instantaneously, to an arbitrary sub-region of the file."

"Now if we flip over to object world, the idea of writing to the middle of an object while someone else is accessing it is more or less sacrilege. The immutability of objects is an assumption that is cooked into APIs and applications."

The 2026 design lesson: immutability as an object-storage invariant is load-bearing (at-least-once notifications, CRR, log processors, image-transcoding pipelines all depend on whole-object-creation semantics) and must be preserved. File semantics are delivered alongside in a distinct presentation layer — see concepts/boundary-as-feature and concepts/file-vs-object-semantics — rather than by weakening the object invariant.

Block-store flavor — Magic Pocket volumes

The same "immutable unit, mutate-by-rewrite" contract shows up one level below the object API in Dropbox's Magic Pocket. The immutable unit there is a volume (a fixed-size container of many blobs), not an object:

  • Blobs are never modified in place; updates / deletes write new data.
  • Volumes are closed once filled, and never reopened. A volume allocation is not recoverable without rewriting its remaining live blobs into a new volume and retiring the old.

This pushes the mutability burden one layer up: the compaction layer. Two-stage reclamation pipeline (GC marks → compaction frees) is a direct consequence of the immutability invariant holding on volumes. Magic Pocket's multi-strategy compaction (L1 + L2 + L3) over different volume fill-level ranges is the block-store analogue of what Iceberg's managed-compaction does over immutable Parquet files on S3, and what S3-Files' stage-and-commit does over file-level edits. Different data models, same root property.

(Source: sources/2026-04-02-dropbox-magic-pocket-storage-efficiency-compaction)

Seen in

Last updated · 200 distilled / 1,178 read