SYSTEM Cited by 2 sources

S3 Inventory¶

S3 Inventory is AWS's managed daily / weekly object-listing report for an S3 bucket or a filter within one. Delivered to a destination bucket as CSV, Parquet, or ORC, each row describes one object (key, version, size, storage class, last-modified, …). It is the canonical way to get a consistent point-in-time listing of a bucket without issuing millions of LIST requests.

Role for this wiki¶

S3 Inventory shows up in three shapes in the corpus:

Input to S3 Batch Operations — the native manifest format for batch jobs on every object in a bucket.
Join-side for access-based retention (patterns/s3-access-based-retention) — join against S3 server access logs to compute the set of unused prefixes over a rolling window. The inventory enumerates "what exists"; SAL enumerates "what was accessed"; the disjunction is "what's unused and safe to delete."
Cost-estimation source for Default Access Retention (patterns/iam-policy-gated-cold-tier-access) — Yelp builds a dashboard "on S3 Inventory" that combines the amount of data in scope with current Intelligent Tiering storage classes to project the dollar cost of a planned cross- window read. The cost estimate then drives the tiered-approval gate on the Terraform PR amending the bucket IAM policy.

Seen in¶

sources/2026-05-21-yelp-how-partition-access-visualizations-reduced-our-data-lake-s3-cost-by-33 — S3 Inventory used as both (a) optional join-side for Yelp's partition-access-visualisation aggregate (linking "what was accessed" to "what storage class are those objects on") and (b) substrate of the cost-estimation dashboard backing Default Access Retention's tiered-approval Terraform-PR gate. Reinforces Inventory's role as the canonical "what exists + what does it cost to keep / read" source for storage-class-aware cost decisioning.
sources/2025-09-26-yelp-s3-server-access-logs-at-scale — Yelp's weekly access-based table joins S3 Inventory with a week of SAL to compute unused prefixes. Prefix extraction is explicit about handling trailing slashes — "removing trailing slash because we wanted to avoid confusion where a prefix '/foo' would determine whether a key '/foo/' was accessed or not." Inventory is also the source that translates accessed-prefixes back to full S3 object names for the Batch Operations manifest.

systems/aws-s3 — parent service.
systems/aws-s3-intelligent-tiering — the storage class whose tiering state Inventory exposes per-object, enabling cost projection for cross-tier reads.
systems/s3-batch-operations — most common downstream consumer of Inventory reports.
systems/amazon-athena — the query engine that joins Inventory with SAL.
systems/yelp-partition-access-visualization — canonical consumer of Inventory ⋈ access-data joins for storage-class- aware analysis.
patterns/s3-access-based-retention — canonical wiki pattern depending on Inventory ⋈ SAL.
patterns/iam-policy-gated-cold-tier-access — canonical wiki pattern using Inventory as the cost-estimation source.
concepts/default-access-retention — Yelp's named retention primitive whose cost-acknowledgement gate is Inventory-backed.

S3 Inventory¶

Role for this wiki¶

Seen in¶

Related¶