SYSTEM Cited by 1 source
S3 Inventory¶
S3 Inventory is AWS's managed daily / weekly object-listing
report for an S3 bucket or a filter within one. Delivered to a
destination bucket as CSV, Parquet, or ORC, each row describes
one object (key, version, size, storage class, last-modified, …).
It is the canonical way to get a consistent point-in-time
listing of a bucket without issuing millions of LIST requests.
Role for this wiki¶
S3 Inventory shows up in two shapes in the corpus:
- Input to S3 Batch Operations — the native manifest format for batch jobs on every object in a bucket.
- Join-side for access-based retention (patterns/s3-access-based-retention) — join against S3 server access logs to compute the set of unused prefixes over a rolling window. The inventory enumerates "what exists"; SAL enumerates "what was accessed"; the disjunction is "what's unused and safe to delete."
Seen in¶
- sources/2025-09-26-yelp-s3-server-access-logs-at-scale — Yelp's weekly access-based table joins S3 Inventory with a week of SAL to compute unused prefixes. Prefix extraction is explicit about handling trailing slashes — "removing trailing slash because we wanted to avoid confusion where a prefix '/foo' would determine whether a key '/foo/' was accessed or not." Inventory is also the source that translates accessed-prefixes back to full S3 object names for the Batch Operations manifest.
Related¶
- systems/aws-s3 — parent service.
- systems/s3-batch-operations — most common downstream consumer of Inventory reports.
- systems/amazon-athena — the query engine that joins Inventory with SAL.
- patterns/s3-access-based-retention — canonical wiki pattern depending on Inventory ⋈ SAL.