CONCEPT Cited by 1 source
Default Access Retention¶
Definition¶
Default Access Retention (DAR) is Yelp's named middle-ground retention primitive between deletion-based retention (data is gone) and cold-tier-by-default (data remains freely readable but more expensive to read). DAR keeps the data, makes it cheaper to keep, and forces consumption decisions through a human-gated process with up-front cost disclosure.
The shape (Yelp 2026-05-21)¶
"We introduced a middle ground for cases where data owners could not further expand deletion-based retention or cold storage due to uncertain future requirements: define an expected access window and implement access-based retention. Data beyond the Default Access Retention period remains in S3 but is gated behind a restrictive S3 bucket IAM policy that requires an explicit process to gain access. The data consumer raises a Terraform PR to request temporary access to restricted partitions and estimates the associated cost using a dashboard built on S3 Inventory. Certain approval levels are required based on the magnitude of the cost."
(Source: sources/2026-05-21-yelp-how-partition-access-visualizations-reduced-our-data-lake-s3-cost-by-33)
Three components¶
1. The access window¶
Per-table or per-prefix configuration: "the expected access window." Data inside the window is freely readable on Intelligent Tiering (or whichever IT-equivalent storage class the data sits on). Data beyond the window is technically present but procedurally inaccessible.
The access window is decided based on granular usage attribution data — typically driven by concepts/partition-access-pattern analysis identifying the longest-back partitions with active consumption signatures.
2. The IAM gate¶
Data outside the access window is locked behind a restrictive S3 bucket IAM policy. "Their query fails with an Access Denied exception."
This is structurally the patterns/iam-policy-gated-cold-tier-access pattern — the IAM layer enforces the access window directly, not application code or hope-based discipline.
3. The Terraform-PR + cost-acknowledgement workflow¶
To regain access to data outside the window:
- The consumer raises a Terraform PR modifying the bucket IAM policy to grant temporary access to the specific restricted partitions.
- The consumer estimates the cost of the planned access using a dashboard built on S3 Inventory — the dashboard combines "the amount of data in scope and the current S3 Storage Classes" to project the cost.
- "Certain approval levels are required based on the magnitude of the cost" — the PR review process implements a tiered approval flow (thresholds and approver identities not disclosed).
- After approval, the IAM policy is amended; the consumer reads the data; the policy reverts.
Two named benefits¶
Benefit 1: Tiering progression is guaranteed to complete¶
"Unexpected or accidental query patterns do not reset the flow of objects through Intelligent Tiering tiers. Storage cost is guaranteed to decrease after the initial 30 day period of Intelligent Tiering."
Because IT re-promotes objects to higher (more expensive) tiers when they are read, accidental access of cold partitions resets the tiering clock. The IAM gate prevents accidental access, so the IT cost-reduction trajectory (40% at 30 days, 81% at 90 days) is realised without operational disruption.
This is the structural argument for keeping data on IT and gating access via IAM rather than tiering data to Glacier directly: IAM is reversible (you can grant access for an explicit window) and free; Glacier moves carry the minimum-duration tax.
Benefit 2: Cost acknowledgement before cold-tier scans¶
"Data consumers acknowledge associated costs of reading data from cold Intelligent Tiers, ensuring that it is justified by the business value of the analysis."
The Terraform-PR workflow's load-bearing function is forcing the consumer to think about cost before issuing the read. The load-bearing example:
"For our largest tables, full table scans could add significant S3 costs by accessing PBs of data from cheap Intelligent Tiers like Archive Instant Access. This is not obvious to users who are writing SQL to inspect data!"
A naïve SELECT * over a hot table can charge millions if the
table ranges over a PB of cold-tier data. DAR's IAM gate is the
forcing function that converts an unwitting cost bomb into a
deliberate consumption decision.
DAR vs deletion vs cold-tier-by-default¶
| Strategy | Data preserved | Storage cost | Access friction |
|---|---|---|---|
| Deletion-based retention | No | None | N/A (data gone) |
| Cold-tier by default (Glacier) | Yes | Lowest | Restore latency + retrieval fees |
| Default Access Retention | Yes | Low (IT-deepest-tier) | IAM denial + PR + cost ack |
DAR's distinction: data is preserved (as in cold-tier) and storage cost is low (as in cold-tier), but access requires explicit human authorisation rather than just paying the retrieval fee.
Substrate dependencies¶
- systems/aws-s3-intelligent-tiering — the storage class underneath. DAR keeps data on IT, not on Glacier; the cost reduction comes from IT's Archive Instant Access tier landing the 81% savings without retrieval fees.
- systems/aws-iam — the gate. Restrictive bucket policies enforce the access window.
- systems/s3-inventory — the cost-estimation source. The dashboard joins inventory rows (object size, storage class) to project the cost of a planned access.
- Terraform — the policy-amendment substrate. PR + review enforces the cost acknowledgement step.
When DAR is the right answer¶
- Data owners cannot expand deletion-based retention — uncertain future requirements (e.g. compliance might require it; analysts occasionally back-test against it; one team uses it irregularly).
- Cold-tier-by-default would carry the minimum-duration tax — same uncertainty makes Glacier a bad bet.
- The team has confidence in the IT-tier-progression argument — the 81% IT savings at 90 days is enough cost reduction to make the IAM-gated approach worthwhile.
Caveats (not in source)¶
- False-positive cost — legitimate queries blocked by an Access Denied exception when the consumer is not familiar with the DAR workflow. Operational cost not disclosed.
- PR review backlog — the cost acknowledgement gate adds latency to every cross-window query. Yelp does not disclose the SLA or typical review cycle time.
- Approval-level thresholds — "certain approval levels" is the only disclosure. Concrete thresholds, approver identities, and delegation rules are not in the post.
Seen in¶
- sources/2026-05-21-yelp-how-partition-access-visualizations-reduced-our-data-lake-s3-cost-by-33 — canonical Yelp disclosure.
Related¶
- systems/aws-s3-intelligent-tiering — the underlying storage class.
- systems/aws-iam — the gate.
- systems/s3-inventory — cost-estimation source.
- systems/aws-s3 — substrate.
- concepts/cold-storage-minimum-duration-tax — the tax DAR avoids by keeping data on IT.
- concepts/granular-usage-attribution — what enables the team to set the access window confidently.
- concepts/partition-access-pattern — what visualises the access-window decision.
- patterns/iam-policy-gated-cold-tier-access — the canonical pattern.
- patterns/s3-access-based-retention — sibling pattern (deletion vs gating).