PATTERN Cited by 1 source
Raw-to-columnar log compaction¶
Problem¶
A continuous log producer (S3 Server Access Logs, CloudFront access logs, Lambda request logs, ALB access logs, application request logs) writes many small raw-text objects to a destination store over the course of a day. At scale this is unqueryable:
- S3 API rate limits throttle a query engine trying to scan millions of tiny objects.
- Per-request overhead dominates useful payload. Transitions to cheaper storage classes (Glacier / Deep Archive) are more expensive than necessary because the per-object transition fee is the same for a 1-KB object as for a 128-MB object.
- Text format has no column-level predicate pushdown — every query reads every byte.
- No row-group statistics — no pruning of sub-file ranges.
The pain is load-bearing. Yelp (2025-09-26):
"Inevitably, the accumulated volume of logs for these buckets with
SimplePrefixbecame impossible to query in Athena due to S3 API rate limits." (Source: sources/2025-09-26-yelp-s3-server-access-logs-at-scale)
Pattern¶
On a daily (or hourly, weekly) batch, read all raw-text log objects for a window, parse them, and rewrite as a small number of columnar objects (typically Parquet) partitioned for the expected query shape.
Canonical shape¶
- Raw sink: source logs land at the destination in small
text objects, in whatever partition layout the producer
supports. For S3 SAL:
PartitionedPrefix+[account]/[region]/[bucket]/[date]/per the concepts/s3-server-access-logs spec. - Compaction job runs on a batch scheduler
(systems/tron at Yelp) once per window:
a. Enumerate the window's raw objects.
b. Run an idempotent INSERT into the columnar target table
(patterns/idempotent-athena-insertion-via-left-join).
c. Verify row counts (Yelp:
GetQueryRuntimeStatisticsRowson first insertion; distinct-count on re-insertion). d. Tag the source raw objects for lifecycle expiration (patterns/object-tagging-for-lifecycle-expiration). - Query the columnar target, not the raw, for all downstream analytics and debugging.
Reported outcomes (Yelp on SAL at fleet scale)¶
- 85% storage reduction — Parquet's per-column compression
- row-group encoding dominates.
- 99.99% object-count reduction — small text objects collapsed into a small number of per-partition Parquet objects.
Both deltas improve downstream storage-class transitions, query-engine latency, and API-request cost.
Partitioning discipline¶
The columnar target's partition layout should match the common
query shape. Yelp's SAL compaction uses
(bucket_name, yyyy/MM/dd) partition keys — matching "a day's
worth of logs" as the canonical query window. Finer partition
granularity would cause
over-partitioning; coarser would
force too much data scanned per query.
See partition projection for the enumeration mechanism.
Idempotence requirements¶
The compaction job will be retried — cluster evictions (concepts/athena-shared-resource-contention), quota hits, transient S3 errors. Retry-safety is mandatory:
- Use self-LEFT-JOIN on a unique column to avoid duplicate rows on retry.
- Delete raw objects only after the insertion + count verification succeeds, via tag-based expiration rather than immediate delete (so a bug discovered later can re-run from still-present raw).
Trade-offs vs alternatives¶
| Approach | Cost | Query perf | Complexity |
|---|---|---|---|
| Raw text only | storage: highest (no compression); API: per-query cost scales with tiny-object count | very bad at scale | trivial |
| CloudTrail Data Events (AWS alt) | $1/M events (orders of magnitude higher at fleet scale) | good | low |
| Raw-to-columnar compaction (this pattern) | storage: lowest; batch-job CPU; API: one-shot per window | very good | moderate (idempotence, retention, partitioning) |
| Streaming columnar (Firehose → Parquet) | streaming infra cost; good perf | good | high (streaming guarantees + schema evolution) |
Seen in¶
- sources/2025-09-26-yelp-s3-server-access-logs-at-scale — canonical wiki instance. Yelp's SAL compaction: TiBs/day input, Tron-orchestrated Athena INSERTs, 85% storage + 99.99% object-count reduction. Composes with patterns/object-tagging-for-lifecycle-expiration for deleting raw source objects after compaction, and concepts/partition-projection for Glue partition handling.
Related¶
- systems/apache-parquet — typical target format.
- systems/amazon-athena, systems/aws-glue — typical compaction engine + catalog.
- systems/yelp-s3-sal-pipeline — canonical wiki worked example.
- concepts/s3-server-access-logs — canonical source log type.
- concepts/columnar-storage-format — primitive.
- patterns/idempotent-athena-insertion-via-left-join — the retry-safe insertion shape.
- patterns/object-tagging-for-lifecycle-expiration — the raw-cleanup step.