Skip to content

PATTERN Cited by 1 source

Object tagging for lifecycle expiration

Problem

You need to delete millions of individual S3 objects on a schedule that varies per-object — for example, source raw-text logs immediately after they have been compacted into Parquet, or unused objects identified by an access-based retention job. Three naive approaches don't scale:

  1. Per-object DELETE API — at millions of objects / day, issue rate is the bottleneck. Even batched DeleteObjects (100-key batch) tops out at account-wide write TPS limits.
  2. Bucket-level lifecycle TTL — too coarse. You can't say "delete this object in 7 days, that object in 90."
  3. Lifecycle-policy-per-prefix — modifying the lifecycle policy each time would require constant configuration churn and hit the policy size limits.

Yelp's verbatim framing:

"That's the only scalable way to delete per object without needing to modify lifecycle policy each time or issuing delete API calls." (Source: sources/2025-09-26-yelp-s3-server-access-logs-at-scale)

Pattern

Decouple "which objects to delete" from "when to delete them":

  1. Apply a tag (e.g. expire=true) to each object that should be deleted. S3 supports up to 10 tags per object, each a key-value pair.
  2. Configure the bucket's lifecycle policy with a rule that expires objects carrying the tag: Filter: { Tag: { Key: "expire", Value: "true" } } + Expiration: { Days: N }.
  3. Let AWS do the actual deletion asynchronously.

The tagging step is the scalable operation — per-object PutObjectTagging or batched via S3 Batch Operations.

Shape at two scales

Low-volume buckets: direct tagging

For buckets that produce hundreds to a few thousand objects per compaction window, issue PutObjectTagging calls directly from the compaction job. Avoids the fixed $0.25 per-bucket-per- job S3 Batch Operations fee.

High-volume buckets: S3 Batch Operations

For buckets producing millions of objects per window, build an S3 Batch Operations manifest (CSV of bucket,key) and submit a PutObjectTagging job. Batch Operations parallelises the tagging and reports per-object success/failure to an S3 report.

Gotchas from Yelp's 2025-09-26 post:

  • Athena query results include a header row — Batch Ops interprets it as a bucket name, causing job failures. "To work around this, we recreate manifest files in memory without headers."
  • Object keys in manifests must be URL-encoded to quote_plus(key, safe="/") equivalence.
  • Flat $0.25 per bucket per job fee — dominates for low-volume buckets, hence the two-scale dispatch rule above.
  • Batch Operations does not support Delete as an action — which is why the indirect tag-then-expire pattern exists. PutObjectTagging is the load-bearing supported action here.

Why tag-then-expire beats direct-delete

  • Scalable: lifecycle runs async and at AWS's own pace; your job only has to issue the tag, not the delete.
  • Auditable: tagged objects are visible (pre-expiration) in the inventory; you can revert by removing the tag before the expiration day.
  • Idempotent: re-tagging an already-tagged object is a no-op.
  • Recoverable: if a bug in the compaction job emits bad tags, you have a window (the lifecycle's grace period) to detect and untag before actual deletion.

Composition

Common stacks:

Reverse-direction variant

The opposite pattern — untag to prevent deletion — also applies. Apply keep=true at ingest; lifecycle rule is "expire unless tagged". Safer default for systems where the failure mode of "forgot to tag" is worse than "deleted prematurely."

Seen in

  • sources/2025-09-26-yelp-s3-server-access-logs-at-scale — canonical wiki instance. Yelp tags compacted source SAL objects for lifecycle expiration; composes with S3 Batch Operations PutObjectTagging for high-volume buckets and direct tagging for low-volume buckets. The $0.25 per-bucket- per-job fee drives the two-scale dispatch rule; Batch Ops's lack of a Delete action is what makes this pattern necessary in the first place.
Last updated · 476 distilled / 1,218 read