CONCEPT Cited by 1 source

Best-effort log delivery¶

Best-effort log delivery names the reliability contract on the loosest end of the streaming-delivery spectrum: a log record may be missed, arrive late, or be duplicated, with no framework-level retry or ordering guarantees. It is cheaper and simpler than at-least-once or exactly-once delivery — and is often the only contract available from managed-cloud logging primitives (notably S3 Server Access Logs).

Definition (AWS SAL canonical)¶

"Delivery of access logs is best-effort, meaning a log may occasionally be missed, arrive late, or have duplicates." (Source: sources/2025-09-26-yelp-s3-server-access-logs-at-scale, quoting the AWS docs)

Designing around best-effort¶

Three disciplines make best-effort acceptable for a given use case:

Measure the straggler tail. Yelp's measurement at fleet scale on SAL:
- < 0.001% arrive > 2 days late.
- Longest observed straggler: ~9 days.
- Implies that any pipeline with a multi-week retention window swallows the tail naturally.
Pick a retention window longer than the tail. Yelp: "Our retention periods are much longer than the maximum log delay." If your downstream decision (e.g. deletion) is gated on "has this object been accessed recently?", ensure the window of "recently" comfortably exceeds your measured straggler tail.
Aggregate to a coarser grain where stragglers still add up. Yelp's access-based retention works at prefix granularity, not object granularity: "deletions are based on prefixes—so missing all logs for a given prefix would only occur for truly inactive data." Even if one object's log line is dropped, the rest of the prefix will generate enough log lines to signal access.

When best-effort is unacceptable¶

Billing events / financial transactions — missed records are missed revenue. Use at-least-once + idempotency tokens or a durable changelog (CDC).
Incident / compromise traceability — if the log is the sole source-of-truth for a security audit, straggler loss can hide the attack. (Yelp's SAL-for-incident-response mitigates this implicitly by not needing every log line for pattern recognition — the access fingerprint shows up in many log lines, losing some doesn't hide the attack.)
Deduplication-by-row-key for analytics — if you need exact counts (not order-of-magnitude), best-effort's duplicate-risk leads to over-counting.

Comparison on the delivery-semantics ladder¶

Tier	Miss	Late	Duplicate	Typical cost
Best-effort	yes	yes	yes	lowest — often free with the source service
At-least-once	no	yes	yes	+idempotency or dedupe cost downstream
Exactly-once	no	yes	no	+transactional outbox / 2PC / Kafka EoS overhead
In-order at-least-once	no	bounded	yes	+single-writer-per-key or total-order broker

Straggler policy as a design axis¶

Given best-effort delivery, the system owner must decide what to do with stragglers that arrive after the downstream job has already processed the window. Yelp's choice:

"We decided that the straggler logs can be ignored to deliver business value in a timely fashion. The straggler logs can be inserted at a later time after tagged objects have expired."

Alternatives:

Late-arriving-record re-insertion — pipeline re-opens the window and inserts stragglers. Higher correctness, more orchestration complexity.
Separate late-arrival sink — stragglers land in a distinct object / table; used for one-off audits.
Drop (Yelp's choice on SAL) — when downstream decisions are at aggregate / prefix granularity.

Seen in¶

sources/2025-09-26-yelp-s3-server-access-logs-at-scale — canonical first-party disclosure at fleet-scale: < 0.001% > 2 days late; max observed ~9 days. Yelp's access-based retention explicitly depends on best-effort delivery's measured tail being well-inside retention windows + deletion granularity being prefix-level.

concepts/s3-server-access-logs — the AWS primitive that ships under best-effort semantics.
concepts/at-least-once-delivery — the next tier up on the reliability ladder.
concepts/exactly-once-delivery — the strongest tier.
concepts/straggler-tolerance