Skip to content

SYSTEM Cited by 1 source

Metabucket (S3 Bucket Metadata Store)

Metabucket is S3's internal bucket-metadata system — the store that holds the record for each bucket (name, owner, policy, region, config). It is separate from the much larger object-metadata namespace that tracks individual objects.

(Source: sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes)

Why it's a separate system

S3's object namespace is designed for hundreds of trillions of records. Bucket count per account was historically 100, so the bucket-metadata workload looked nothing like the object workload: small, low-write-rate, human-scale. A separate system let the two be tuned independently.

The 100 → 1M-per-account rewrite (2024)

Warfield notes Metabucket "has already been rewritten for scale, even with the 100 bucket per account limit, more than once in the past." Raising the cap to 1M buckets/account (Nov 2024) required another rewrite plus ecosystem work:

  • Metabucket itself had to scale from "human" (100-ish per account) to "programmatic" (up to 1M per account) — orders of magnitude more objects to store, more index pressure, more update traffic.
  • New paged ListBuckets API — the old full-listing semantics wasn't usable at millions.
  • Soft cap of 10K (opt-in beyond) as a guardrail to protect unprepared clients.
  • Cross-service rendering fixes in "tens of services" whose AWS Console widgets ListBuckets-then-HeadBucket-per-bucket on render and could spin for tens of minutes at the new scale.

What it illustrates

That sharp edges inherited from early design choices get expensive to unwind at scale — not because the data problem is hard, but because everything up the stack (consoles, other AWS services, customer tooling) has assumed the cap and optimised accordingly. The headline is a limit change; the engineering is a cross-org campaign.

Seen in

Last updated · 200 distilled / 1,178 read