CONCEPT Cited by 1 source
Ephemeral local disk vs EBS¶
Definition¶
A classic AWS-era storage tradeoff for stateful services: run on network-attached block storage (EBS) — durable, movable between instances, slower, subject to mount/unmount transitions — or on ephemeral local SSD on the instance — fast, lost with the instance, never movable. This concept names the tradeoff explicitly because "use local SSD" is only safe after an orthogonal mechanism is in place to make the local data reconstructable.
The two axes¶
EBS (network-attached block)¶
- Durable — survives instance stop/start; volume can be re-attached to a new instance if the old one dies.
- Movable — primary-restart-to-new-host workflows can re-attach the volume instead of redownloading state, keeping startup under a minute on well-sized indices.
- Slower — network-bound, IOPS budgets and burst buckets cap throughput.
- Unreliable at the mount boundary — Yelp's pre-1.0 Nrtsearch reports: "EBS movement was not as smooth as expected. At times, the EBS volume would not be correctly dismounted from the old node, and then the new node would take some time to mount it." (sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10)
- Subject to correlated regional/AZ EBS failure — well-documented by Fly.io and others as a class-of-blast-radius higher than a single instance.
Ephemeral local SSD¶
- Fast — on-box, no network hop, high sustained IOPS and bandwidth.
- Lost with the instance — any instance termination wipes the disk.
- Unmovable — there's no volume to re-attach.
- Not the source of truth for any data committed before the current instance started.
The missing pre-condition¶
"Should I use local SSD or EBS?" is the wrong question in isolation. The correct question is: does something else guarantee the data is reconstructable? If yes, local SSD wins on every axis except bootstrap speed. If no, EBS is the only sensible choice because the disk is the durability guarantee.
The mechanism that re-enables local SSD is typically one of:
- Object-storage durability — every commit writes to S3 (or equivalent), so the local copy is a cache, not the authoritative store. Canonical incremental-S3-backup instance.
- Replication-level durability — data is replicated synchronously (or semi-sync) to peer nodes before the write is ack'd, so any one instance's loss doesn't lose data.
- Upstream-log replayability — an upstream WAL (Kafka, Kinesis) can be replayed to reconstruct state, at the cost of replay time.
Canonical instance: Yelp Nrtsearch 1.0.0¶
Pre-1.0 Yelp Nrtsearch ran the primary on EBS because EBS was the source of truth: "If the EBS was lost, corrupted, or took too long to resize, we would have to reindex all data." (sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10)
With 1.0's incremental backup on commit, S3 becomes the source of truth for committed segments. The primary can move to ephemeral local SSD with no durability loss — a primary restart re-downloads from S3 in parallel (parallel S3 download) with a 5× speedup vs. the old serialised download path. The three pre-1.0 EBS drawbacks (source-of-truth fragility, mount transitions, backup-frequency pressure from replica-catchup) all go away.
Related concept pairs¶
- systems/aws-ebs / systems/aws-s3 — the AWS services at each durability tier.
- concepts/correlated-ebs-failure — a specific EBS failure mode that motivates not depending on it as the only durability substrate.
- concepts/warm-pool-instances — a pattern that accepts local-SSD loss by keeping warm replacements ready.
Seen in¶
- sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10 — canonical wiki instance. Yelp moves Nrtsearch primaries from EBS to ephemeral local SSD, enabled by incremental-backup-on-commit to S3. Three pre-1.0 EBS drawbacks enumerated (source-of-truth fragility, mount transition issues, backup- frequency pressure) all resolved by the architectural change.