CONCEPT Cited by 1 source
Cold-storage minimum-duration tax¶
Definition¶
The cold-storage minimum-duration tax is the structural failure mode of cold-tier object-storage classes (e.g. S3 Glacier family, Azure Archive, GCS Archive) when access patterns are uncertain: the combination of minimum storage durations (early-deletion fees) and per-access retrieval fees can negate or invert the savings when access turns out to be more frequent than expected.
The tax is structural: it does not depend on operator error. The mechanism — early-deletion-as-if-stored-the-minimum + per-GB retrieval — is a property of the storage class itself.
The Yelp framing (2026-05-21)¶
"This is in contrast to cold storage classes (e.g., S3 Glacier) that impose minimum storage durations and retrieval fees that can negate savings if you access data more than you expected to."
The contrast is structural: S3 Intelligent Tiering is bet-symmetric — cost moves with actual access in either direction without penalty. Glacier is bet-asymmetric — savings require correctly forecasting access; mis-forecasting eats the savings and charges retrieval fees on top.
(Source: sources/2026-05-21-yelp-how-partition-access-visualizations-reduced-our-data-lake-s3-cost-by-33)
Two distinct mechanisms¶
1. Minimum storage duration¶
Cold tiers charge as if the object had been stored for the minimum period even if you delete it earlier. AWS S3 Glacier classes:
- Glacier Instant Retrieval / Flexible Retrieval — 90-day minimum.
- Glacier Deep Archive — 180-day minimum.
If you tier an object to Glacier and discover within 30 days that it needs to come back, you pay 90 days of Glacier storage on the early deletion. For ambiguous-access data, this risk is structurally load-bearing — you bet the data is cold and pay for being wrong.
2. Retrieval fees¶
Each read from Glacier triggers a per-GB retrieval fee (and for Glacier Flexible Retrieval / Deep Archive, an asynchronous restore process with restore-class-specific time-to-availability).
For data that might be accessed PB-scale (e.g. analytics tables
where a SQL author writes an unconstrained SELECT *), the
retrieval fee can dwarf any storage savings.
Why uncertain access is the killer¶
The tax is not punitive in the abstract — it is the price of the deeper-tier savings, which exist only because AWS amortises the storage equipment over data that is rarely retrieved. The killer is uncertain access:
- Access more than expected → retrieval fees + possible early- deletion charges erase savings.
- Access less than expected → the cold-tier was the right bet, but the data owner could have got most of the savings (~81% at 90 days) from Intelligent Tiering without taking the bet at all.
Yelp's argument: in the absence of confident access-pattern data, Intelligent Tiering dominates — it captures most of the savings on the no-access-was-correct leg without exposing you to the access-was-incorrect leg.
The IT comparison (Yelp 2026-05-21)¶
"Savings from S3 Intelligent Tiering can be significant: objects not accessed for 30 days decrease in cost by 40%; objects not accessed for 90 days decrease in cost by 81%. The latter approaches the cost of S3 Glacier!"
The 81% number is the load-bearing argument: at 90 days no-access, IT extracts most of what Glacier offers, without the minimum- duration tax and without retrieval fees. The remaining gap (the last few percent of savings) is not worth the asymmetric risk for data with unpredictable access patterns.
When the tax is acceptable¶
The tax is acceptable when:
- Access is provably rare — confirmed retention requirements, compliance archives, regulatory holds.
- Access volume is predictable — known restore cadence and volume (e.g. annual audit pulls).
- Savings differential is large enough that the retrieval-fee risk is small — exabyte-scale archives where the residual savings gap between IT and Deep Archive is material in absolute terms.
For bulk analytics-table data with uncertain future access, the tax structurally argues for IT-by-default.
The defence: prevent accidental cold-tier reads¶
When the IT-IA / Archive Instant Access tier is hosting cold data
that is technically still readable, the structural risk shifts
from minimum-duration to accidental access cost — a SQL author
runs an unconstrained SELECT * over a PB-scale table and reads
through the cold tier at full read cost.
Yelp's defence is the Default Access Retention pattern: a restrictive S3 bucket IAM policy gates access to partitions beyond the access window, forcing an explicit Terraform PR + cost acknowledgement workflow before the cold-tier read can happen.
The PB-scale Archive Instant Access full-table-scan disclosure:
"For our largest tables, full table scans could add significant S3 costs by accessing PBs of data from cheap Intelligent Tiers like Archive Instant Access. This is not obvious to users who are writing SQL to inspect data!"
See patterns/iam-policy-gated-cold-tier-access.
Generalisation beyond AWS¶
The same tax shape exists across cloud providers:
- AWS S3 Glacier — 90-day or 180-day minimums; retrieval fees; asynchronous restore for some tiers.
- Azure Archive — 180-day minimum; retrieval fees; rehydration.
- GCS Archive — 365-day minimum; retrieval fees.
The numbers differ, but the bet-asymmetry is structural across all three. The Yelp framing applies broadly.
Seen in¶
- sources/2026-05-21-yelp-how-partition-access-visualizations-reduced-our-data-lake-s3-cost-by-33 — canonical wiki disclosure; load-bearing argument for IT-by-default in the absence of confident access-pattern data.
Related¶
- systems/aws-s3-glacier — canonical AWS cold-storage class.
- systems/aws-s3-intelligent-tiering — the bet-symmetric alternative.
- systems/aws-s3 — parent service.
- concepts/default-access-retention — Yelp's IAM-gated guard against accidental cold-tier reads on IT.
- concepts/granular-usage-attribution — the observability primitive that lets a team escape uncertain-access by acquiring certain access data.
- patterns/iam-policy-gated-cold-tier-access — the structural defence against accidental access.