SYSTEM Cited by 20 sources

Amazon S3 (Simple Storage Service)¶

Amazon S3 is AWS's foundational object storage service, launched March 14, 2006 as the first public AWS service. By early 2025 it holds hundreds of trillions of objects across 36 regions and serves as the primary storage for nearly every AWS service and a large share of the public internet's data lakes. Originally a 4-verb HTTP API (PUT/GET/DELETE/LIST) over immutable objects grouped into buckets, it has evolved into a platform whose defining characteristic — per its own team — is not the object API but the properties of the storage: elasticity, durability, availability, security, and performance. "Making S3 simple" is treated as an ongoing program of removing distractions so builders "work with their data and not have to think about anything else."

(Source: sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes)

Defining properties (Warfield, 2025)¶

Elastic capacity — no upfront provisioning, no per-bucket capacity limits, no notion of "running out of space". See concepts/elasticity.
Elastic performance — "any customer should be entitled to use the entire performance capability of S3, as long as it didn't interfere with others." Throughput discipline is exposed via the systems/aws-crt (Common Runtime) library: a GPU instance can drive "hundreds of gigabits per second in and out of S3".
Strong read-after-write consistency (since 2020) — customers report "deleting code and simplifying their systems" after the guarantee landed. See concepts/strong-consistency.
Very high durability and availability — taken for granted by design ("we are successful only when these things can be taken for granted").
Immutable objects as the low-level primitive — see concepts/immutable-object-storage. Higher-level primitives (versioning, object lock, cross-region replication, S3 Tables) are layered on top.

Evolution arcs called out in the 2025 post¶

API surface: 4 verbs, many properties¶

"A lot of people associate the term simple with the API itself — that an HTTP-based storage system for immutable objects with four core verbs… is a pretty simple thing to wrap your head around. But looking at how our API has evolved… I'm not sure this is the aspect of S3 that we'd really use 'simple' to describe." The architectural claim is that the properties of S3 storage, not the verbs, define it.

Consistency: eventual → strong (Dec 2020)¶

Moving to strong read-after-write consistency for overwrites and LIST was not marketed as a performance feature; its value was measured in customer code deleted. Retrofit of this guarantee onto a globally distributed object store is a rare example of a public cloud service trading hard internal engineering for a net-simpler customer API.

Conditional operations (2024)¶

PUT-If-Match / compare-and-swap semantics against object metadata/version enable atomic multi-writer coordination — see patterns/conditional-write. Rollout pattern matched consistency: customers used it to delete external locking / versioning code.

Bucket limits: 100 → up to 1M per account (Nov 2024)¶

Historically buckets were a human construct — created in the console, tracked by an admin — and capped at 100/account. Customers wanted buckets as a programmatic per-dataset / per-tenant resource, for policy and sharing. The rewrite required:

Scaling Metabucket (S3's bucket-metadata system, distinct from the object-metadata namespace) — already rewritten for scale more than once before this round. See systems/metabucket.
A new paged ListBuckets API.
An opt-in soft limit of 10K beyond the old 100, to prevent the very problem a test account with millions of buckets surfaced — AWS console widgets in other services (e.g. Athena) that ListBuckets + HeadBucket-per-bucket on render could take tens of minutes at high bucket counts.
Cross-service fixes across "tens of services" on that rendering pattern.

This is the canonical example of limit-removal is a cross-team engineering project, not a config change.

Performance: throughput then latency¶

Throughput elasticity — publish S3's request-parallelization and retry strategies; bake them into the systems/aws-crt so any language bindings get the same performance. Individual GPU instances driving "hundreds of gigabits per second"; Anthropic at "tens of terabytes per second" account-level.
Latency tier — systems/s3-express-one-zone (2023), first SSD storage class, single-AZ by design to minimise latency. Trades multi-AZ resilience for tail latency on hot data.

S3 Tables (re:Invent 2024) — the object→table move¶

Until 2024, tables on S3 were a customer-managed open table format over Parquet (typically systems/apache-iceberg). Large customers pointed out they were "building their own table primitive over S3 objects" and asked S3 to own the cross-object structure. S3 Tables lifts Iceberg to a first-class S3 resource:

Own endpoint per table.
Table is the policy resource (not the constituent objects).
Managed compaction, GC, and tiering at the Iceberg layout level.
New APIs for table creation and snapshot commit.

See systems/s3-tables and concepts/open-table-format. The architectural claim behind Tables: "it's these properties of storage that really define S3 much more than the object API itself" — therefore tables can be a first-class S3 construct alongside objects without contradicting what S3 is.

Internal systems referenced¶

systems/metabucket — bucket metadata store (separate from the object-namespace metadata).
systems/aws-crt — Common Runtime library exposing S3 best-practice request parallelization / retry to SDKs.
systems/s3-express-one-zone — SSD, single-AZ low-latency class (2023).
systems/s3-tables — managed-Iceberg first-class table resource (2024).
systems/shardstore — rewritten per-disk storage layer (Rust, executable-spec validated). See FAST '23 keynote + SOSP paper.

Physical + operational story (Warfield FAST '23 keynote, 2025)¶

The 2025-02-25 ATD post gives the operational story that pairs with the "simplicity" retrospective above. See sources/2025-02-25-allthingsdistributed-building-and-operating-s3 for the full write-up.

Built out of millions of hard drives¶

S3 is composed of "hundreds of microservices" and "millions" of hard drives.
A single HDD delivers about 120 random-access IOPS, and that number has been flat since before S3 launched in 2006. Capacity has grown 7.2M× since the 1956 RAMAC; seek time only 150×. See concepts/hard-drive-physics.
Industry HDD roadmap: 200 TB/drive this decade. At that point a drive supports 1 IOPS per 2 TB. S3 will use them anyway.

Heat management¶

Heat = requests per disk per unit time. Hotspots queue requests and that queueing amplifies through dependent layers (metadata lookups, erasure-coding reconstructs) into concepts/tail-latency-at-scale.
Two levers: (1) Spread each bucket's objects across different drive sets — any one customer's data is a tiny fraction of any one drive, and a single customer's burst reaches millions of drives (patterns/data-placement-spreading). (2) Use redundancy as a steering tool: replication gives N read sources per logical read; concepts/erasure-coding (Reed-Solomon, k identity + m parity, read any k of k+m) gives both capacity efficiency and steering flexibility. See patterns/redundancy-for-heat.
Why this works: concepts/aggregate-demand-smoothing. Millions of bursty tenants aggregate into a smooth demand curve no single workload can move — so the placement problem reduces to translating smooth aggregate into smooth per-drive load.

Durability as a human + organizational mechanism¶

Durability reviews (patterns/durability-review): every durability-affecting change carries a threat-model artifact (concepts/threat-modeling) — summary, list of threats, how the change is resilient. Explicit preference for coarse-grained guardrails over per-risk mitigations.
systems/shardstore as a canonical guardrail: S3's rewritten per-disk storage layer, in Rust, with an executable specification (~1% the code size) checked into the same repo and tested against the real implementation on every commit via property-based testing. Frames concepts/lightweight-formal-verification as an industrialized technique — normal engineers can maintain the spec without formal-methods PhDs. Published at SOSP.

"AWS ships its org chart" — applied ownership¶

S3's top-level block diagram (frontend fleet + namespace + storage fleet + background data services) maps 1:1 to organizational groups, and every sub-component recurses into its own teams with their own fleets. Inter-team interactions are literal API contracts.
This is an instance of concepts/ownership as a scaling primitive: teams go faster when they own their services end-to-end (API, durability, performance, 3-AM pages, post-incident improvements).
Warfield's personal generalization: senior-engineer leverage comes from articulating problems, not dispensing solutions — "my best ideas are the ones that other people have instead of me."

Storage platform with multiple first-class data primitives (2024-2026)¶

Between re:Invent 2024 and 2026-04 the S3 team added three new first-class data primitives, each a distinct presentation over the same S3 storage properties (elasticity / durability / availability / performance / security) — see patterns/presentation-layer-over-storage. The 2026-04-07 "S3 Files" post is explicit that this now defines S3's architectural trajectory: not an object store that added features, but a storage platform whose API surface is a set of data primitives chosen to fit how applications actually want to work with data.

Primitive	Launch	Page
Objects	2006	This page — 4-verb API over immutable blobs
Tables	re:Invent 2024	systems/s3-tables — managed Iceberg; table as policy resource
Vectors	preview 2025-07-16	systems/s3-vectors — elastic similarity-search indices; Cosine/Euclidean; 10K indexes/bucket × tens-of-M vectors/index; up-to-90% TCO claim
Files	2026-04-07	systems/s3-files — NFS mount over S3 data, backed by EFS

Warfield's framing (2026):

"Different ways of working with data aren't a problem to be collapsed. They're a reality to be served."

And the strategic argument for why storage gets broader as AI agents change application lifetimes:

"As the pace of application development accelerates, this property of storage has become more important than ever, because the easier data is to attach to and work with, the more that we can play, build, and explore new ways to benefit from it."

See concepts/agentic-data-access.

S3 Files: the "boundary-as-feature" design breakthrough¶

systems/s3-files is worth calling out on this page because it crystallised a design principle — concepts/boundary-as-feature — that generalises beyond storage. Key design arc:

Six months of "EFS3" convergence design failed. Trying to fuse file and object into one unified system produced "a battle of unpalatable compromises" — the lowest common denominator, not the best of both worlds.
The breakthrough was inverting the goal. Stop hiding the file/object boundary; make the boundary the feature. See concepts/file-vs-object-semantics for the enumerated asymmetries (mutation granularity, atomicity, authorization, namespace semantics, namespace performance).
concepts/stage-and-commit (term borrowed from git) is the translation mechanism: file-side changes accumulate, commit back to S3 as one PUT roughly every 60 seconds. Bidirectional sync. Conflict policy: S3 wins; file-side loser → lost+found + CloudWatch metric.
concepts/lazy-hydration: first directory access imports metadata as a background scan so mount-and-work is instantaneous even on multi-million-object buckets; file data < 128 KB co-hydrates, larger files hydrate on read; 30-day idle eviction keeps active working set proportional.
"Read bypass": sequential-read throughput reroutes off NFS to parallel direct-GETs against S3 — 3 GB/s per client, Tbps across many clients.
Known edges (called out explicitly): directory rename is O(objects) (mount warning > 50M objects); no programmatic explicit- commit API at launch; some S3 keys aren't valid POSIX filenames.

See patterns/explicit-boundary-translation for the generalised pattern and systems/aws-efs for the under-the-covers backing.

Design principles visible¶

Remove limits, not feature-flag them — strong consistency, conditional writes, bucket-count ceilings: the pattern is to eliminate the sharp edge rather than expose it as a knob.
Design for the property, not the API — concepts/elasticity and concepts/strong-consistency are the properties developers feel; the API is what changes least.
patterns/customer-driven-prioritization — features are prioritised from direct conversations with customer-builders; customers probe unreleased REST verbs before launch.
concepts/simplicity-vs-velocity — explicitly acknowledged tension: every simplification improves against an earlier feature that wasn't simple enough; racing to ship backloads simplification work that's more expensive later.

Seen in¶

sources/2025-09-26-yelp-s3-server-access-logs-at-scale — S3 in three named sub-system roles at Yelp's fleet scale: (1) the storage being monitored — Yelp runs object-level access logging across many buckets / multiple accounts; (2) the producer of S3 Server Access Logs (SAL) — canonical wiki instance of SAL as a system primitive, with measured delivery semantics (best-effort; <0.001% > 2-day late tail); (3) the SAL destination bucket under the same-account + same-region constraint ("must exist in the same account, as well as region to eliminate cross-region data charges"). Adds systems/s3-batch-operations and systems/s3-inventory to the S3 family of named wiki systems, and canonicalises three S3-operational patterns at the tagging-and-lifecycle altitude: patterns/raw-to-columnar-log-compaction (85 % storage + 99.99 % object-count reduction on SAL via daily Parquet compaction), patterns/object-tagging-for-lifecycle-expiration (tag-then-expire is the only scalable per-object deletion primitive at fleet scale; Batch Ops's no-Delete action constraint is what makes it necessary), and patterns/s3-access-based-retention (inventory ⋈ SAL at prefix granularity; enables safe deletion even under best-effort log delivery). Also surfaces the URL-encoding idiosyncrasy — most SAL operations double-encode key, but BATCH.DELETE.OBJECT / S3.EXPIRE.OBJECT single-encode, with ambiguous outcomes on user-controlled % characters. First Yelp Seen-in on the S3 page; first SAL-at-fleet-scale disclosure in the wiki corpus. Tier-3 decisively on-scope — concrete operational numbers across storage / query / lifecycle axes, multiple reusable patterns canonicalised.
sources/2023-07-16-highscalability-gossip-protocol-explained — Amazon S3 named as a canonical production gossip-protocol deployment: "Amazon S3 uses the gossip protocol to spread server state across the system." The claim traces back to the 2011 Todd Hoff post Using Gossip Protocols For Failure Detection, Monitoring, Messaging And Other Good Things — S3's fleet is large enough that gossip was the pick for intra-cluster state distribution. Third-party explainer- level citation — S3 team has not (publicly) disclosed the gossip-layer mechanism, fanout, or cycle interval. Useful as a pointer that S3 uses gossip; not an authoritative source for how.
[[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — S3-compat substrate for Sprites' disk root. Fly.io Sprites use "S3-compatible object storage" as the root of disk durability — 100 GB/Sprite, chunks stored as immutable content blobs, filesystem exposed via a JuiceFS fork. Ptacek's framing: "S3- compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words 'Sprites are backed by object storage.'" Paired with the metaphor "Object stores are the Internet's Hoover Dams, the closest things we have to infrastructure megaprojects." A customer-architecture-level endorsement of S3's durability thesis — Sprites' primary durability contract is "whatever the backing S3-compat store promises". The concrete backend (Fly.io's own Tigris? stock AWS S3? either?) is not named; S3-compat is the only stated constraint.
sources/2026-04-07-allthingsdistributed-s3-files-and-the-changing-face-of-s3 — Andy Warfield on the launch of S3 Files (2026-04-07) and the broader reframing of S3 as a multi-primitive storage platform. Introduces systems/s3-files (NFS mount over S3 data, EFS-backed), situates systems/s3-vectors (launched Nov 2025) in the lineage, and positions the three new primitives (Tables / Vectors / Files) alongside objects. Canonical articulation of concepts/boundary-as-feature — the "EFS3" convergence design dead end, and the post-holidays-2024 pivot to concepts/stage-and-commit as a programmable boundary primitive. Names concepts/agentic-data-access as an emerging design concern as agentic coding compresses application lifetimes and makes storage's decoupling role more load-bearing. Enumerates file/object semantic asymmetries (concepts/file-vs-object-semantics) in the most detail of any public AWS source, and frames the design discipline for resolving them (patterns/explicit-boundary-translation). Reported numbers: 2M+ tables in S3 Tables; 300B+ event notifications/day; 25M+ requests/sec to Parquet data alone; S3 Files read-bypass 3 GB/s per client / Tbps across clients; 60s commit cadence; 128 KB lazy- hydration threshold; 30-day file-side eviction; >50M-objects mount warning.
sources/2025-02-25-allthingsdistributed-building-and-operating-s3 — Andy Warfield's FAST '23 keynote (republished on ATD in 2025-02-25). Complementary to the 19-birthday post: this one is the physical and operational story of S3. Surfaces (1) hard-drive physics — ~120 IOPS/drive, flat since before S3 launched; 26 TB today, 200 TB on the roadmap, so 1 IOPS per 2 TB at that point; see concepts/hard-drive-physics. (2) Heat management — requests per drive as a first-class placement problem; hotspots produce queueing → stragglers → concepts/tail-latency-at-scale; see concepts/heat-management. (3) Aggregate demand smoothing — millions of bursty tenants aggregate into a smooth curve no single one can move; see concepts/aggregate-demand-smoothing. (4) Spread placement + redundancy-for-heat — a bucket's objects on disjoint drive sets, letting one Lambda-parallel burst touch >1M disks; see patterns/data-placement-spreading, patterns/redundancy-for-heat, concepts/erasure-coding. (5) Organizational scale — "AWS ships its org chart," hundreds of microservices, durability reviews as threat-model for durability changes (patterns/durability-review, concepts/threat-modeling), systems/shardstore + concepts/lightweight-formal-verification as a guardrail, and concepts/ownership as the people-scaling lever.
sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes — Andy Warfield's 19th-birthday retrospective. Canonical post for the "properties, not API" framing, strong consistency / conditional-writes / bucket-limit / S3 Tables arcs.
sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years — day-one Lambda PR/FAQ points to S3 as the persistent store for stateless Lambda handlers: "persistent state should be stored in Amazon S3, Amazon DynamoDB, or another Internet-available storage service."
sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2 — S3 as the durable substrate of Amazon Retail BDT's exabyte-scale data catalog: 50+ PB of Oracle table data landed on S3 in the 2016-2018 migration (wrapped in an systems/amazon-ion schema); swappable compute engines (Hive, Redshift, Spark, Athena, Flink, Glue, and now Ray) front the same S3 storage. Operational numbers from Q1 2024: 1.5 EiB of Parquet input on S3 compacted in a single quarter, >20 PiB/day input S3 read across >1,600 Ray jobs/day. Concrete "swap compute, keep storage" realisation of concepts/compute-storage-separation at exabyte scale.
sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod — S3 in two named roles on the 2026-04-06 HyperPod Inference Operator EKS add-on: (1) TLS-certificate store for the operator's cert-manager issuance flow (a dedicated bucket named by the tlsCertificateS3Bucket parameter of the add-on config — configured by VPC endpoint for in-VPC access); (2) model-weight store — InferenceEndpointConfig bring-your- own-model deployments reference weights on S3, loaded by the Mountpoint for Amazon S3 CSI driver bundled as a default dependency add-on. Instance in the long-running arc of S3-as- default-persistent-substrate for stateless managed-compute services.
sources/2024-02-15-flyio-globally-distributed-object-storage-with-tigris — S3 named in two adjacent roles in Tigris's architecture: (1) incumbent being improved on — Fly.io's framing of the single-write-region + CloudFront pattern as "no way to build a sandwich reviewing empire" for globally distributed users; (2) pluggable backend / archival tier — Tigris's QuiCK-style distribution queue propagates bytes out to "3rd party object stores… like S3", meaning Tigris can be configured with S3 as cold-tier origin while the regional NVMe / FoundationDB front handles hot distribution. Plus the S3-compatible API on Tigris's front: "If your framework can talk to S3, it can use Tigris" — the AWS SDK works unchanged via an AWS_ENDPOINT_URL_S3 override. First wiki example of S3's API shape being explicitly re-used as the presentation layer over a different-shaped backend (a concrete case of patterns/presentation-layer-over-storage at the storage-API level).
sources/2025-05-20-flyio-litestream-revamped — CASAAS consumer entry. Fly.io's 2025-05-20 Litestream redesign cites S3's 2024-11 conditional-write launch as the load-bearing enabler for retiring Litestream's pre-existing "generations" abstraction: "Modern object stores like S3 and Tigris solve this problem for us: they now offer conditional write support. With conditional writes, we can implement a time-based lease." Canonical wiki instance of patterns/conditional-write-lease on S3 — S3's strong-consistency-plus-conditional-writes pair now substitutes for Consul / etcd in a production replication-coordination role, not just catalog-snapshot commits. Extends the customer-code-deletion framing (concepts/simplicity-vs-velocity) into a new substitution: coordination services as the thing S3 can replace for single-writer workloads.
sources/2025-10-02-flyio-litestream-v050-is-here — CASAAS-shipped + newer-S3-APIs datapoint. Litestream v0.5.0 ships the CASAAS lease on S3 conditional writes that the 2025-05-20 post described; the post also notes "We've upgraded all our clients (S3, Google Storage, & Azure Blob Storage) to their latest versions. We've also moved our code to support newer S3 APIs" — implicit reference to the 2024-11 S3 conditional-writes feature CASAAS depends on (client-SDK currency was a precondition to shipping the revamp). Not a new S3-side disclosure — rather the first production-shipping instance of CASAAS-on-S3 in the wiki, distinct from the 2025-05-20 design-post framing.
sources/2025-12-11-flyio-litestream-vfs — Range-GET-as- page-level-read-primitive datapoint. The shipped Litestream VFS explicitly names S3's Range header handling as the primitive it leans on — "modern object storage providers all let us fetch slices of files". Each individual SQLite page read resolves to an HTTP byte-range GET against an LTX file in S3, and the 1% of each file occupied by the LTX EOF index trailer is the cold-open fetch target for building a database-wide page lookup table. Not a new S3-side capability — the Range header has been HTTP-native since forever — but a canonical datapoint for how byte-granular range reads become the load-bearing primitive for a new class of workload (application-linked VFS reads SQLite pages directly from object storage). Complements the CASAAS-on-conditional-writes (write-side) and newer-S3-APIs (SDK-currency) disclosures with a read-side first-class use of S3's byte-range semantics. Pattern: patterns/vfs-range-get-from-object-store.
sources/2024-03-06-highscalability-behind-aws-s3s-massive-scale — Third-party compact index into S3's architectural design principles by Stanislav Kozlovski (Kafka committer, guest post on High Scalability, 2024-03-06). Restates Warfield's FAST '23 material (heat management, HDD physics, erasure coding, spread placement, aggregate demand smoothing, durability reviews, ShardStore executable-spec verification) and adds a few detail points not already on the wiki: (1) 2024-era scale numbers — 100M req/sec, 400 Tbps, 280 trillion objects, 31 regions / 99 AZs, >300 microservices; (2) S3's 4-component architecture (frontend REST + namespace + storage + storage-management) as a canonical instance of Conway's Law — "AWS ships its org chart," inter-team interactions as literal API contracts; (3) ShardStore as an LSM tree with shard data stored outside the tree (to reduce write amplification), soft-updates-based crash consistency, ~40k lines of Rust initially; (4) a named cache-coherency witness mechanism behind 2020's strong consistency cutover — per-object write ordering + a component acting as write-witness + read-barrier that invalidates potentially stale cache views; (5) patterns/multipart-upload-parallelism as the best-practice pattern for saturating S3 throughput (many clients × many connections × many endpoints, plus multipart PUT / ranged GET intra-operation); (6) patterns/durable-chain-of-custody — SDK-computed checksum appended as HTTP Trailer (avoids two-pass scanning) that propagates through every hop to storage, closing the pre-S3-arrival corruption gap. Tier-1 aggregator-summary of AWS-published material; useful as a compact reference, but the primary sources (sources/2025-02-25-allthingsdistributed-building-and-operating-s3, sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes, the SOSP 2021 ShardStore paper) are authoritative for the claims Kozlovski restates.
sources/2024-08-01-segment-0-6m-year-savings-by-using-s3-for-change-data-capture-for-dynamodb — S3 as CDC changelog store for a petabyte-scale DynamoDB pipeline. Segment's objects pipeline V2 migrated the CDC changelog of a ~1 PB / ~958B-item DynamoDB table off Google Cloud Bigtable and onto S3, saving ~$0.6M/year via the composite of storage-unit-cost delta + cross-cloud egress elimination + component-count simplification. Verbatim: "We recently revamped our platform by migrating from BigTable to offset the growing cost concerns, consolidate infrastructure to AWS, and simplify the number of components." Canonical first wiki instance of S3-as-CDC-log-store (distinct from streaming- broker tiered storage — here S3 is the primary store for the changelog, not a cold tier). Pattern: patterns/object-store-as-cdc-log-store. Scraped raw markdown is truncated before the V2 mechanism section, so prefix layout / file format / compaction details are not disclosed in the wiki-canonicalised content.
sources/2025-10-23-slack-advancing-our-chef-infrastructure-safety-without-disruption — S3 as the fleet-configuration signal bus for a Chef-based EC2 fleet. Slack's phase-2 Chef rollout uses S3 at a new (for this wiki) altitude: a shared signal bus for per-environment fleet-configuration signals, not as object-store, not as CDC log, not as a tiered cold tier. Chef Librarian writes a JSON signal to chef-run-triggers/<stack>/<env> on every cookbook- version promotion; Chef Summoner on every node watches the matching key and triggers chef-client when a new version appears. Payload carries Splay (jitter), Timestamp, and full ManifestRecord (version, cookbook-versions map, S3 artifact pointer, upload_complete flag as ordering barrier). Canonical first wiki instance of S3-as-config-fanout-bus. See concepts/s3-signal-bucket-as-config-fanout and patterns/signal-triggered-fleet-config-apply for the concept + pattern pages.
sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10 — S3 as source of truth for committed search-index data and cluster state. Yelp Nrtsearch 1.0.0 moves the primary off EBS onto ephemeral local SSD; durability for committed Lucene segments shifts entirely to S3 via incremental-per-commit upload of immutable segment files. Replicas bootstrap from S3 via parallel download, yielding a 5× speedup over the previous serial EBS-attach path. Index state also moves to S3 as the state- backend source of truth. Canonical wiki instance of S3-as-source-of-truth-for-search-index-data, distinct from the S3-as-CDC-log-store and S3-as-config-fanout-bus instances above.
— S3 as the externally-visible choke point of the 2025-10-20 AWS us-east-1 incident, two hops above the DynamoDB-originating failure. Verbatim from PlanetScale's post-mortem: "The service responsible for creating, resizing, and configuring database branches … depends on our internal secret-distribution service which depends on Amazon S3 which depends on AWS STS which was impacted by the Amazon DynamoDB outage." Load-bearing detail: S3 itself depends on STS on its internal control path, so an STS outage makes S3 reads fail for any dependent workload. Puts S3 one hop beneath first-party "we depend on S3" dependencies and one hop above STS + DynamoDB in AWS's internal control-plane dependency DAG — an observation not otherwise documented in public AWS materials. First wiki-canonical example of S3 as a transitive rather than direct dependency in a production postmortem — the caller's dependency was on an internal secret-distribution service, which was the direct consumer of S3; the S3 dependency was three hops away from the caller's own code. Canonical substrate for concepts/runtime-dependency-on-saas-provider's transitive-chain-is-opaque amplifier framing.