PATTERN Cited by 1 source
IAM-policy-gated cold-tier access¶
Problem¶
Data sitting in S3 Intelligent Tiering's deeper tiers (Infrequent Access / Archive Instant Access) is still readable — IT objects don't require a Glacier-style restore — and a SQL query can scan them transparently. Two failure modes follow:
- Tiering-clock reset. "Unexpected or accidental query patterns ... reset the flow of objects through Intelligent Tiering tiers": any read promotes the object back to Frequent Access, undoing 30–90 days of auto-tier-down savings.
- PB-scale cost bomb. "For our largest tables, full table scans could add significant S3 costs by accessing PBs of data from cheap Intelligent Tiers like Archive Instant Access. This is not obvious to users who are writing SQL to inspect data!"
Cold-tier-by-default (move data to Glacier) avoids the access cost bomb but trades it for the minimum-duration tax: storage-class moves carry minimum-duration penalties and retrieval fees that negate savings if access turns out to be more frequent than expected.
For data with uncertain future access — common at petabyte-scale analytics — neither extreme is right.
Pattern¶
Keep the data on Intelligent Tiering, but gate access to data beyond an explicit access window with a restrictive S3 bucket IAM policy. Consumers must raise a Terraform PR to amend the policy, acknowledging the projected cost (estimated from S3 Inventory) before the read is allowed.
Canonical shape (Yelp 2026-05-21 / Default Access Retention)¶
The pattern is the enforcement primitive underneath Default Access Retention — the named retention strategy Yelp introduced for "cases where data owners could not further expand deletion-based retention or cold storage due to uncertain future requirements."
1. Define an explicit access window¶
Per-table or per-prefix configuration declared by the data owner — informed by granular usage attribution data, typically driven by the partition-access-pattern analysis identifying the longest-back partitions still actively read.
2. Apply a restrictive bucket IAM policy outside the window¶
A policy that denies S3 reads for partitions beyond the access window. Default-deny except for an explicit grant list (the access window). Returns:
"Their query fails with an Access Denied exception."
— the structural forcing function for the cost-acknowledgement flow.
3. Cost-acknowledged Terraform-PR re-grant flow¶
To regain access:
- Consumer raises a Terraform PR modifying the bucket IAM policy to grant temporary access to the specific restricted partitions.
- Consumer estimates the cost using a dashboard built on S3 Inventory — the dashboard combines "the amount of data in scope and the current S3 Storage Classes" to project the cost.
- "Certain approval levels are required based on the magnitude of the cost" — tiered approval flow over the PR review.
- After approval, the policy is amended; the consumer reads the data; the policy reverts.
(Source: sources/2026-05-21-yelp-how-partition-access-visualizations-reduced-our-data-lake-s3-cost-by-33)
Two named structural benefits¶
1. Tiering progression is guaranteed to complete¶
"Storage cost is guaranteed to decrease after the initial 30 day period of Intelligent Tiering."
Because no accidental reads can reach the gated partitions, IT's auto-tier-down trajectory (40% at 30 days, 81% at 90 days) lands without operational disruption. The IAM gate is structurally reversible (you can grant access for an explicit window) and free (no minimum-duration tax, unlike Glacier).
2. Explicit cost acknowledgement before cold-tier scans¶
"Data consumers acknowledge associated costs of reading data from cold Intelligent Tiers, ensuring that it is justified by the business value of the analysis."
The Terraform PR + cost dashboard converts an unwitting cost bomb
(SELECT * over a PB-scale table on Archive Instant Access) into a
deliberate consumption decision. The Access Denied exception is
the forcing function that interrupts the consumer before the
cost is incurred.
Comparison to alternatives¶
| Strategy | Data preserved | Storage cost | Access cost on rare access | Operator effort |
|---|---|---|---|---|
| Deletion-based retention | No | None | N/A | Low |
| Cold-tier by default (systems/aws-s3-glacier) | Yes | Lowest | High (retrieval fee + restore latency) | Low |
| IAM-policy-gated cold-tier access | Yes | Low (IT deepest tier) | One Terraform PR + cost ack | Medium (PR review) |
The pattern's distinctive trade: operator review effort in exchange for bet-symmetric storage cost trajectory without the concepts/cold-storage-minimum-duration-tax.
Forces that argue for the pattern¶
- Uncertain future requirements — neither deletion nor Glacier is the right answer.
- Petabyte-scale cold partitions — the cost bomb is large enough that operator review is justified.
- Existing IT adoption — the pattern presumes data already on Intelligent Tiering; no benefit if data is on standard S3.
- Mature IAM + Terraform discipline — the pattern requires a working Terraform PR review process and IAM-policy management competence.
Forces that argue against¶
- Hot data with clear access patterns — no need to gate; let IT do its work.
- Deletion is acceptable — simpler, cheaper.
- Real cold-storage requirements (compliance archives) — Glacier is the right answer.
- Operator review backlog cost — for tables with high-frequency cross-window queries, the PR review overhead may exceed the cost it prevents.
Caveats (not in source)¶
- False-positive cost — legitimate queries blocked by Access Denied when the consumer is unfamiliar with the workflow.
- Approval-level thresholds opaque — "certain approval levels" — concrete dollar thresholds, approver identities, and delegation rules not disclosed.
- PR-review SLA opaque — typical review cycle time not disclosed; for time-sensitive analyses the gate may add latency.
- No disclosure of false-deny rate — how often do legitimate queries hit the gate; what fraction get approved.
Seen in¶
- sources/2026-05-21-yelp-how-partition-access-visualizations-reduced-our-data-lake-s3-cost-by-33 — canonical Yelp disclosure of Default Access Retention with Terraform PR + S3 Inventory cost dashboard.
Related¶
- concepts/default-access-retention — the canonical concept this pattern enforces.
- concepts/cold-storage-minimum-duration-tax — the structural problem the pattern's IT-not-Glacier choice avoids.
- concepts/granular-usage-attribution — what enables the team to set the access window confidently.
- systems/aws-s3-intelligent-tiering — the underlying storage class.
- systems/aws-s3-glacier — the alternative the pattern argues against.
- systems/aws-iam — the gate.
- systems/s3-inventory — cost-estimation source.
- patterns/s3-access-based-retention — sibling pattern (the deletion variant of the same observability substrate).
- patterns/access-pattern-visualization-for-data-stewardship — the upstream observability that informs the access-window decision.