CONCEPT Cited by 1 source
S3 server access logs¶
S3 Server Access Logs (SAL) are the per-bucket access-log primitive Amazon S3 offers: every request against the bucket and its objects (GETs, PUTs, lifecycle expirations, multipart operations, website operations, etc.) generates a log line delivered to a configurable destination bucket. SAL is the cheapest AWS-native way to do object-level access tracing; the pricier alternative is CloudTrail Data Events at "$1 per million data events".
Definition¶
"S3 server access logs contain API operations performed on a bucket, as well as its objects. Logging is enabled per S3 bucket by providing a storage destination; another S3 bucket is recommended due to circular logging. Once the resource policy allows putting objects, logs will start arriving at the configured destination." (Source: sources/2025-09-26-yelp-s3-server-access-logs-at-scale)
Line format¶
Each log line is space-separated with 25+ positional fields. The first seven are AWS-generated and not user-controlled:
file_bucket, remoteip, requester (IAM identity or -),
requestid, operation (e.g. WEBSITE.GET.OBJECT,
REST.GET.OBJECT, BATCH.DELETE.OBJECT, S3.EXPIRE.OBJECT),
key, http_status. The remaining fields include
request_uri, error_code, bytes_sent, object_size,
total_time, turn_around_time, referrer, user_agent,
version_id, host header / TLS info / etc. The regex specified
by AWS for parsing lives in the
AWS docs.
The schema is extensible: "The regex ends with .*$: it
accounts for the possibility of additional columns being added at
any time."
Delivery semantics¶
SAL is best-effort — "meaning a log may occasionally be missed, arrive late, or have duplicates." Best-effort log delivery is the load-bearing model you must design around. Yelp's measurement at fleet scale: < 0.001% of SAL arrives > 2 days late; they observed instances as late as 9 days. (Source: sources/2025-09-26-yelp-s3-server-access-logs-at-scale)
Target-object key formats¶
Two shapes, controlled by TargetObjectKeyFormat:
SimplePrefix(historical default) —[TargetPrefix][YYYY]-[MM]-[DD]-[hh]-[mm]-[ss]-[UniqueString]. Flat namespace. At scale this becomes Athena-unqueryable because of S3 API rate limits on prefix scans.PartitionedPrefix(recommended for query workloads) —[TargetPrefix][SourceAccountId]/[SourceRegion]/[SourceBucket]/[YYYY]/[MM]/[DD]/.... Gives Athena a natural partition boundary; Yelp migrated fleet-wide to this format. Delivery optionEventTime(vsLogArrivalTime) "gives the benefit of attributing the log to the event time."
AWS added date-based partitioning for SAL in November 2023 — the unblocker that made object-level logging tractable via Athena querying.
Destination-bucket constraints¶
- Same account as the source bucket (AWS restriction).
- Same region as the source bucket (AWS restriction + cost: "eliminate cross-region data charges").
- Resource policy on the destination must allow the logging
service to
PutObject. - Using the same bucket as destination causes circular logging — another bucket is strongly recommended.
User-controlled field hazards¶
Three fields are arbitrary user-input, written unescaped:
request_urireferrer(HTTPRefererheader)user_agent
These break any naive space-or-quote-delimited regex. See
concepts/user-controlled-log-fields for the general hazard
and patterns/optional-non-capturing-tail-regex for the
common workaround (wrap the user-controlled tail in
(?:<rest>)? so the first seven non-user-controlled fields
still parse).
Key encoding idiosyncrasy¶
Most SAL operations double-url-encode key; for some
operations — notably BATCH.DELETE.OBJECT and
S3.EXPIRE.OBJECT — the key is url-encoded only once.
See concepts/url-encoding-idiosyncrasy-s3-keys for the
full discussion and why naive url_decode(url_decode(key))
is unsafe.
Comparison to CloudTrail Data Events¶
| Axis | S3 Server Access Logs | CloudTrail Data Events |
|---|---|---|
| Cost | storage of emitted logs (best-effort delivery, compactable) | $1 per million data events |
| Delivery | best-effort; ≤0.001% > 2-day late (Yelp measured) | reliable delivery |
| Partitioning | raw text, needs compaction for scale | delivered to S3 / CloudWatch, queryable |
| Granularity | bucket-level enablement | per-trail, fine-grained |
At fleet scale the cost axis dominates — per the 2025-09-26 source, "$1 per million data events — that could be orders of magnitude higher!"
Seen in¶
- sources/2025-09-26-yelp-s3-server-access-logs-at-scale —
canonical wiki source. Full definition + line format +
delivery semantics + destination-bucket constraints + hazards.
Yelp's operational choice:
PartitionedPrefix+EventTimedefault via Terraform module; daily Parquet compaction pipeline over the destination bucket (patterns/raw-to-columnar-log-compaction).
Related¶
- systems/aws-s3 — producer + typical destination.
- systems/amazon-athena — the query engine over SAL.
- systems/aws-cloudtrail — the pricier alternative for object-level access tracing.
- concepts/best-effort-log-delivery
- concepts/user-controlled-log-fields
- concepts/url-encoding-idiosyncrasy-s3-keys
- concepts/partition-projection
- patterns/raw-to-columnar-log-compaction