SYSTEM Cited by 2 sources
S3 Inventory¶
S3 Inventory is AWS's managed daily / weekly object-listing
report for an S3 bucket or a filter within one. Delivered to a
destination bucket as CSV, Parquet, or ORC, each row describes
one object (key, version, size, storage class, last-modified, …).
It is the canonical way to get a consistent point-in-time
listing of a bucket without issuing millions of LIST requests.
Role for this wiki¶
S3 Inventory shows up in three shapes in the corpus:
- Input to S3 Batch Operations — the native manifest format for batch jobs on every object in a bucket.
- Join-side for access-based retention (patterns/s3-access-based-retention) — join against S3 server access logs to compute the set of unused prefixes over a rolling window. The inventory enumerates "what exists"; SAL enumerates "what was accessed"; the disjunction is "what's unused and safe to delete."
- Cost-estimation source for Default Access Retention (patterns/iam-policy-gated-cold-tier-access) — Yelp builds a dashboard "on S3 Inventory" that combines the amount of data in scope with current Intelligent Tiering storage classes to project the dollar cost of a planned cross- window read. The cost estimate then drives the tiered-approval gate on the Terraform PR amending the bucket IAM policy.
Seen in¶
- sources/2026-05-21-yelp-how-partition-access-visualizations-reduced-our-data-lake-s3-cost-by-33 — S3 Inventory used as both (a) optional join-side for Yelp's partition-access-visualisation aggregate (linking "what was accessed" to "what storage class are those objects on") and (b) substrate of the cost-estimation dashboard backing Default Access Retention's tiered-approval Terraform-PR gate. Reinforces Inventory's role as the canonical "what exists + what does it cost to keep / read" source for storage-class-aware cost decisioning.
- sources/2025-09-26-yelp-s3-server-access-logs-at-scale — Yelp's weekly access-based table joins S3 Inventory with a week of SAL to compute unused prefixes. Prefix extraction is explicit about handling trailing slashes — "removing trailing slash because we wanted to avoid confusion where a prefix '/foo' would determine whether a key '/foo/' was accessed or not." Inventory is also the source that translates accessed-prefixes back to full S3 object names for the Batch Operations manifest.
Related¶
- systems/aws-s3 — parent service.
- systems/aws-s3-intelligent-tiering — the storage class whose tiering state Inventory exposes per-object, enabling cost projection for cross-tier reads.
- systems/s3-batch-operations — most common downstream consumer of Inventory reports.
- systems/amazon-athena — the query engine that joins Inventory with SAL.
- systems/yelp-partition-access-visualization — canonical consumer of Inventory ⋈ access-data joins for storage-class- aware analysis.
- patterns/s3-access-based-retention — canonical wiki pattern depending on Inventory ⋈ SAL.
- patterns/iam-policy-gated-cold-tier-access — canonical wiki pattern using Inventory as the cost-estimation source.
- concepts/default-access-retention — Yelp's named retention primitive whose cost-acknowledgement gate is Inventory-backed.