Skip to content

SYSTEM Cited by 2 sources

S3 Inventory

S3 Inventory is AWS's managed daily / weekly object-listing report for an S3 bucket or a filter within one. Delivered to a destination bucket as CSV, Parquet, or ORC, each row describes one object (key, version, size, storage class, last-modified, …). It is the canonical way to get a consistent point-in-time listing of a bucket without issuing millions of LIST requests.

Role for this wiki

S3 Inventory shows up in three shapes in the corpus:

  1. Input to S3 Batch Operations — the native manifest format for batch jobs on every object in a bucket.
  2. Join-side for access-based retention (patterns/s3-access-based-retention) — join against S3 server access logs to compute the set of unused prefixes over a rolling window. The inventory enumerates "what exists"; SAL enumerates "what was accessed"; the disjunction is "what's unused and safe to delete."
  3. Cost-estimation source for Default Access Retention (patterns/iam-policy-gated-cold-tier-access) — Yelp builds a dashboard "on S3 Inventory" that combines the amount of data in scope with current Intelligent Tiering storage classes to project the dollar cost of a planned cross- window read. The cost estimate then drives the tiered-approval gate on the Terraform PR amending the bucket IAM policy.

Seen in

  • sources/2026-05-21-yelp-how-partition-access-visualizations-reduced-our-data-lake-s3-cost-by-33 — S3 Inventory used as both (a) optional join-side for Yelp's partition-access-visualisation aggregate (linking "what was accessed" to "what storage class are those objects on") and (b) substrate of the cost-estimation dashboard backing Default Access Retention's tiered-approval Terraform-PR gate. Reinforces Inventory's role as the canonical "what exists + what does it cost to keep / read" source for storage-class-aware cost decisioning.
  • sources/2025-09-26-yelp-s3-server-access-logs-at-scale — Yelp's weekly access-based table joins S3 Inventory with a week of SAL to compute unused prefixes. Prefix extraction is explicit about handling trailing slashes — "removing trailing slash because we wanted to avoid confusion where a prefix '/foo' would determine whether a key '/foo/' was accessed or not." Inventory is also the source that translates accessed-prefixes back to full S3 object names for the Batch Operations manifest.
Last updated · 542 distilled / 1,571 read