Skip to content

SYSTEM Cited by 1 source

ShardStore

Definition

ShardStore is the bottom-most layer of Amazon S3's storage stack — the per-disk software that manages data on each individual hard drive. A few years before Warfield's FAST '23 keynote, S3 rewrote this layer from scratch. Two choices define it architecturally:

  1. Implementation language: Rust. Chosen for type safety and to get structured language support for catching bugs early. The team went further and wrote libraries that extend Rust's type safety to apply to on-disk structures.
  2. Executable specification in the same repo. Alongside the production ShardStore, the team checks in a Rust model of ShardStore's logic, stripped of hardware and on-disk-layout complexity. It is "about 1% of the size of the real system," functions as a compact executable specification, and is exercised against the real code via property-based testing on every commit.

"It wound up being about 1% of the size of the real system, but allowed us to perform testing at a level that would have been completely impractical to do against a hard drive with 120 available IOPS."

— Andy Warfield, 2025

Why it exists

A single hard drive inside S3 delivers about 120 IOPS (see concepts/hard-drive-physics). A bug at the per-disk layer can cause silent corruption — the most expensive failure mode in a storage service that sells 11 9s of durability. Testing against real drives at 120 IOPS cannot cover enough state space to be confident. An executable specification, by contrast, runs at CPU speed and can be exhaustively exercised with property-based tests.

Guardrail philosophy

ShardStore is described by the post as an instance of the S3 team's preference for coarse-grained guardrails over per-bug mitigations. Rather than track each possible on-disk bug and write a test for it, the team invested in a structural property — "production behavior matches specification behavior" — and built tooling so normal engineers without PhDs in formal methods can maintain that property.

See patterns/durability-review and concepts/lightweight-formal-verification for the broader context.

Published companion

The team published at SOSP: "Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3" (paper).

What ShardStore is not

  • Not the whole of S3. It is one layer among "hundreds of microservices." See systems/aws-s3.
  • Not a data format or replication engine — those live above it. ShardStore just owns the disk.
  • Not tied to any particular redundancy scheme (see concepts/erasure-coding).

Seen in

Last updated · 200 distilled / 1,178 read