Skip to content

CONCEPT Cited by 1 source

Lazy Hydration

Lazy hydration is an initialization pattern: on first exposure of a data view to an application, populate the view's metadata eagerly (so operations work immediately) but defer fetching the data bytes themselves until they're actually accessed. The view appears fully operational from the outset even though its backing store hasn't been fully materialised.

The 2026 S3 Files post is the clearest AWS-published instance of this pattern applied to a filesystem-over-object-store layer.

(Source: sources/2026-04-07-allthingsdistributed-s3-files-and-the-changing-face-of-s3)

S3 Files' specific mechanism

On first directory access to a mounted bucket or prefix:

  1. S3 Files imports metadata from S3 and populates a synchronised file-optimised namespace (in EFS).
  2. For files < 128 KB, data is pulled alongside metadata (cheap enough to co-hydrate).
  3. For files ≥ 128 KB, only metadata is imported. The actual bytes are fetched from S3 when the file is read.

The scan runs as a background operation so the customer can "mount and immediately work with objects in S3 as files." Without this, a multi-million-object bucket would take minutes to hours to enumerate before any work could start.

Warfield's framing of why this matters:

"This 'start working immediately' part is a good example of a simple experience that is actually pretty sophisticated under the covers — being able to mount and immediately work with objects in S3 as files is an obvious and natural expectation for the feature, and it would be pretty frustrating to have to wait minutes or hours for the file view of metadata to be populated."

Why the < 128 KB threshold

  • Small files — metadata and data are comparable in cost; pulling both up front avoids a round-trip penalty on first read for what are likely hot files.
  • Large files — pulling bytes eagerly would dominate the hydration cost and the customer almost certainly doesn't need all of them right away.

The threshold is a tuning parameter that balances "minimise first-read latency" against "don't eagerly materialise data the customer doesn't need."

Complementary mechanism: lazy eviction

Paired with lazy hydration, S3 Files applies lazy eviction:

"File data that hasn't been accessed in 30 days is evicted from the filesystem view but not deleted from S3, so storage costs stay proportional to your active working set."

The two together make the filesystem view a hot-set cache over an authoritative S3 object store — hydrated on demand, aged out on disuse, always reconstructible because S3 is the source of truth.

Trade-offs

  • First-read latency (for large files) is paid at access time, not at mount time. Applications that stream through many large files will see the hydration cost spread across their run.
  • Cache coherence — changes to S3 made by other tools must propagate back to the hydrated filesystem view; S3 Files' sync mechanism handles this (roughly every 60s, plus on-read for individual files).
  • Metadata scan cost — even metadata-only import has an upper bound at which it becomes too expensive; S3 Files warns at mounts covering more than 50M objects, partly because of this.
  • Page cache in Linux — same idea applied at the kernel level; files are hydrated into memory as they're accessed, evicted under pressure.
  • Snowflake micro-partition caching — warehouse node caches accessed micro-partitions, evicts cold ones.
  • Browser fetch-on-scroll — same pattern on the UI tier: render skeleton metadata, fetch payload when visible.

Seen in

Last updated · 200 distilled / 1,178 read