All Things Distributed (Werner Vogels)¶
All Things Distributed is the personal engineering blog of Werner Vogels, CTO of Amazon.com. Tier-1 on the sysdesign-wiki: canonical commentary on AWS service history, distributed-systems culture (PR/FAQs, tenets, Working Backwards), and the engineering rationale behind foundational AWS primitives. Content often includes primary-source material — internal documents, talks, and first-person retrospectives — not available elsewhere.
Scope and why Tier 1¶
- Direct CTO-level perspective on design decisions at AWS scale.
- Cross-references: every major AWS foundational paper/product (Dynamo, S3, Lambda, DynamoDB, EC2) gets a retrospective here.
- Companion to AWS re:Invent keynotes; "behind the curtain" material on how Amazon makes architectural decisions (narrative docs, tenets, working backwards).
- When a claim about AWS architecture is contested, an
allthingsdistributedpost is often the definitive source. - Occasionally publishes guest posts from AWS Distinguished Engineers (e.g. Andy Warfield on S3) with primary-source technical detail not available elsewhere.
Key systems (as referenced from this blog)¶
- systems/aws-ebs — network block storage for EC2 (2008); HDD→SSD→Nitro→SRD→custom-SSD arc; >140T ops/day today; sub-ms io2 Block Express.
- systems/nitro — AWS offload card + lightweight hypervisor family; VPC, EBS, encryption all moved off Xen dom0.
- systems/srd — data-center transport that replaces TCP for storage; multi-path, out-of-order, offload-friendly; also systems/ena-express for guest TCP.
- systems/aws-nitro-ssd — AWS custom SSDs, EBS-tailored.
- systems/physalia — EBS's config/control plane; "removes the control plane from the IO path."
- systems/xen — EC2's pre-Nitro hypervisor; its defaults capped hosts at 64 outstanding IOs.
- systems/aws-s3 — object storage; 19-year retrospective (2025) on simplicity as an architectural property, feature arcs (consistency, conditional writes, bucket limits, Tables); and FAST '23 keynote (2025-02-25) on physical/operational scale (HDD physics, heat management, ShardStore, durability reviews, ownership).
- systems/shardstore — S3's rewritten per-disk storage layer (Rust
- executable-spec lightweight formal verification).
- systems/s3-tables — managed-Iceberg first-class table resource (re:Invent 2024).
- systems/s3-vectors — elastic similarity-search indices as a first-class S3 primitive (re:Invent 2025); S3-object-like cost/performance/durability profile; hundreds → billions of vectors.
- systems/s3-files — NFS mount over any S3 bucket/prefix (2026-04-07); EFS-backed filesystem presentation; stage-and-commit translation to S3 objects; design-breakthrough origin of concepts/boundary-as-feature.
- systems/aws-efs — under-the-covers filesystem backing for S3 Files.
- systems/s3-express-one-zone — SSD, single-AZ latency tier (2023).
- systems/metabucket — S3's bucket-metadata subsystem.
- systems/aws-crt — Common Runtime; S3 client best-practice library.
- systems/apache-iceberg — open table format; the pattern S3 Tables absorbed.
- systems/apache-parquet — columnar on-object file format.
- systems/aws-lambda — serverless compute service; launch PR/FAQ published here at 10 years.
- systems/firecracker — Lambda's micro-VM isolation primitive; density unlock for multi-tenant serverless.
- systems/aurora-dsql — serverless distributed SQL (re:Invent 2024); single-journal-per-commit + Crossbar subscription router; 100% JVM → 100% Rust journey.
- systems/postgresql — DSQL extends Postgres via public extension API rather than forking.
- systems/aws-sagemaker-ai — AWS's unified managed ML platform (2017 launch); umbrella for Studio, spaces, notebooks, managed training, hosting, and HyperPod.
- systems/aws-sagemaker-hyperpod — SageMaker's large-scale distributed-training / inference compute substrate; surface for observability + model-deployment + training-operator changes (2025).
- systems/aws-systems-manager — SSM Session Manager is the
substrate under SageMaker AI's
StartSessionSSH-over-SSM tunnel. - systems/ussd — 1990s GSM stateful session-based menu protocol; 2G, no data plan, $20 feature phones; Werner's Oct 2025 thesis-post entity.
- systems/mpesa — Safaricom mobile-money platform on AWS; 4K TPS, real-time ML fraud detection, >$100B processed in 2024; introduced in Werner's USSD post as the flagship patterns/feature-phone-frontend instance.
- systems/koko-networks — Sub-Saharan bioethanol cooking-fuel IoT network; 700+ cloud-connected KOKOpoint stations; same USSD/ feature-phone customer edge applied to physical-goods retail.
- systems/bedrock-guardrails-automated-reasoning-checks — Bedrock capability that verifies AI outputs against a customer-authored specification; up to 99% provable accuracy; finance / healthcare / government target industries.
- systems/bedrock-agentcore — AWS agent runtime for mechanically enforcing capability envelopes on agentic systems; the enforcement half of patterns/envelope-and-verify.
- systems/kiro — AWS's specification-driven development tool; flagship surface for agentic coding + formal proof combined.
- systems/lean — interactive theorem prover founded and led by Leo de Moura (at Amazon); named by Cook as the most promising AI-reliability development; DeepSeek combines Lean + RL.
- systems/aws-policy-interpreter — decade of automated-reasoning proof over IAM / Cedar semantics; proofs now extend to agent- generated policy changes.
Key patterns / concepts¶
- concepts/queueing-theory — the bank-metaphor framing of IO stacks; why spreading a hot tenant across many spindles widens the blast radius.
- concepts/noisy-neighbor, concepts/performance-isolation — the central quality problem in multi-tenant storage; 15 years of EBS design is iterative variance elimination.
- concepts/hardware-offload — Nitro as a queue-reduction + CPU-reclamation + hypervisor-isolation lever, not just a perf lever.
- concepts/incremental-delivery — "series of incremental improvements over time" as EBS's explicit delivery posture.
- patterns/full-stack-instrumentation, patterns/loopback-isolation — measurement-first engineering at the storage-IO layer.
- patterns/hot-swap-retrofit, patterns/nondisruptive-migration — fleet-upgrade-in-flight primitives; taping SSDs into every HDD server in 2013.
- patterns/peer-debugging — Marc Olson's "I had become the bottleneck" scaling-people shift.
- patterns/pr-faq-writing — Amazon's working-backwards narrative doc practice, as articulated on this blog.
- patterns/customer-driven-prioritization — the S3 team's default feature-selection posture; "almost everything we do has been in direct response to requests from S3 customers."
- patterns/conditional-write — CAS on object storage; S3 GA 2024.
- patterns/durability-review — S3's gated threat-model review for durability-affecting changes.
- patterns/executable-specification — same-language spec checked into repo, continuously validated via property-based testing (ShardStore).
- patterns/data-placement-spreading — place a bucket's objects on disjoint drive sets; one customer's data is a tiny fraction of any one drive.
- patterns/redundancy-for-heat — replicas and EC shards as I/O-steering degrees of freedom, not just durability mechanisms.
- concepts/elasticity — capacity + performance elasticity as S3's core property; scale-to-zero as its compute analogue on Lambda.
- concepts/strong-consistency — S3 read-after-write, Dec 2020; framed as a code-deletion feature.
- concepts/immutable-object-storage — S3's base data model.
- concepts/boundary-as-feature — when two abstractions differ on load-bearing semantics, design an explicit inspectable translation surface rather than a hidden convergence layer; origin lesson from S3 Files (2026).
- concepts/stage-and-commit — file-side changes accumulate, batch- commit to object-side roughly every 60s; term borrowed from git; programmable boundary primitive.
- concepts/file-vs-object-semantics — five axes of asymmetry (mutation granularity, atomicity, authorization, namespace semantics, namespace performance); S3 Files' enumerated design constraints.
- concepts/lazy-hydration — metadata-first, on-read data fetch; makes mount-and-work instantaneous on multi-million-object buckets.
- concepts/agentic-data-access — as agentic coding compresses application lifetimes, storage's role as the decoupled-from- applications stable layer grows; friction between agent and data amplifies into reasoning overhead.
- patterns/presentation-layer-over-storage — S3's 2024-2026 multi-primitive direction (objects + tables + vectors + files); one storage tier, many first-class presentations.
- patterns/explicit-boundary-translation — implementation pattern for boundary-as-feature: asymmetric consistency contracts, declared cadence + conflict policy, visible failure on non-translatable data, programmable surface.
- concepts/heat-management — S3's placement problem: minimize hotspots across millions of drives.
- concepts/hard-drive-physics — HDD capacity grows fast, seek-time flat; ~120 IOPS/drive, 200 TB/drive roadmap → 1 IOPS/2 TB.
- concepts/erasure-coding — Reed-Solomon (k, m) over Parquet/S3; dual-purpose as durability and heat-steering primitive.
- concepts/aggregate-demand-smoothing — millions of bursty tenants aggregate smooth; scale as a quality lever.
- concepts/lightweight-formal-verification — ShardStore's executable-spec approach (SOSP'21); "industrialized" verification.
- concepts/threat-modeling — security-origin; generalized to durability reviews in S3.
- concepts/ownership — Amazon's organizational primitive; "AWS ships its org chart" applied.
- concepts/open-table-format — Iceberg/Delta/Hudi as a class of metadata layer over immutable objects.
- concepts/simplicity-vs-velocity — first-class engineering concept in Warfield's 2025 S3 retrospective.
- concepts/serverless-compute, concepts/scale-to-zero, concepts/fine-grained-billing, concepts/stateless-compute, concepts/cold-start, concepts/micro-vm-isolation — the full serverless architectural vocabulary, as Amazon framed it at Lambda launch and refined over 10 years.
- patterns/launch-minimal-runtime — Node-first Lambda launch strategy.
- patterns/pilot-component-language-migration — DSQL's Adjudicator-first Rust pilot; 10× TPS result licensed broader rewrite.
- patterns/postgres-extension-over-fork — DSQL's approach to building on Postgres without forking.
- concepts/tail-latency-at-scale — the Marc Brooker "tail at scale" result; forcing function behind DSQL's JVM → Rust move.
- concepts/memory-safety — Rust-over-C rationale for DSQL's Postgres extensions.
- concepts/grey-failure, concepts/monitoring-paradox, concepts/training-serving-boundary — vocabulary named in the SageMaker AI friction-removal post (2025-08-06); grey failure as partial/intermittent degradation (GPU thermal throttle, NIC packet loss); monitoring paradox as the observability stack causing the failure it exists to catch; train/serve boundary as a historical artefact HyperPod's model-deployment collapses.
- patterns/secure-tunnel-to-managed-compute, patterns/auto-scaling-telemetry-collector, patterns/partial-restart-fault-recovery — the three structural patterns packaged by SageMaker AI / HyperPod's 2025 friction-removal release.
- concepts/appropriate-technology — Werner's Oct 2025 "suitable not shiny" doctrine; customer's constraints as the specification; corollary of concepts/simplicity-vs-velocity read from the customer side; invisibility as highest compliment.
- patterns/feature-phone-frontend — thin USSD edge + sophisticated cloud backend; the M-Pesa / Moniepoint / KOKO Networks shape.
- patterns/post-inference-verification — LLM generate → automated-reasoning check → pass / filter; Bedrock Guardrails' automated reasoning checks is the canonical AWS realization.
- patterns/envelope-and-verify — three-part discipline for high-stakes agentic AI: (1) specify the envelope (often temporal), (2) restrict the agent to it via AgentCore, (3) reason about the composition of envelopes against global invariants.
- concepts/automated-reasoning — mechanical proof of system properties against formal specifications; the decade-of-AWS-proof portfolio (policy interpreter, crypto, networking, virtualization, pan-Amazon data flow, ShardStore).
- concepts/neurosymbolic-ai — neural + symbolic composition as Cook's named path to production AI trust; four composition shapes (RL-over-prover, post-inference filter, in-loop tool cooperation, envelope+composition-reasoning).
- concepts/specification-driven-development — specifications as first-class customer-visible artifacts; Kiro + Bedrock Guardrails checks as the productized surfaces; autoformalization as the remaining UX bottleneck.
- concepts/temporal-logic-specification — LTL / CTL / past-time / future-time / epistemic / causal operators; Cook predicts customers will learn and demand these distinctions from spec-driven tools.
Recent articles¶
-
2026-04-07 — sources/2026-04-07-allthingsdistributed-s3-files-and-the-changing-face-of-s3 (Andy Warfield guest post, introduced by Werner Vogels. Launch of systems/s3-files — NFS mount over any S3 bucket/prefix, backed by EFS, accessible from EC2 / containers / Lambda. Most of the post is the design story: six months of attempted "EFS3" convergence in 2024 produced a "battle of unpalatable compromises"; post-Christmas- 2024 the team inverted the goal — the boundary between file and object semantics IS the feature, not a limitation to hide. Origin and canonical articulation of concepts/boundary-as-feature ("we spent months trying to make it disappear, and when we finally accepted it as a first-class element of the system, everything got better"). Architecture: concepts/stage-and-commit translation layer — file-side changes accumulate in EFS, commit back to S3 as one PUT per changed object roughly every 60 seconds; bidirectional sync; conflict policy: S3 wins, filesystem-side loser → lost+found + CloudWatch metric. concepts/lazy-hydration — first access imports S3 metadata as background scan, files < 128 KB co-hydrate data, larger files hydrate on read; 30-day idle eviction keeps active working set proportional. Read bypass reroutes high-throughput sequential reads off NFS to parallel direct-GETs against S3 — 3 GB/s per client, Tbps across many clients. Enumerates five axes of concepts/file-vs-object-semantics asymmetry (mutation granularity / atomicity / auth / namespace / performance) more exhaustively than any prior AWS source. Multiphase- not-concurrent insight: "very few applications use both file and object interfaces concurrently on the same data at the same instant." Known edges called out: rename is O(objects) (warning > 50M objects mount), no programmatic explicit-commit API at launch, some S3 keys aren't valid POSIX filenames. Multi-primitive lineage: S3 Files is the third new first-class data primitive added to S3 after systems/s3-tables (re:Invent 2024) and systems/s3-vectors (re:Invent 2025), following the patterns/presentation-layer-over-storage pattern. Named framing of concepts/agentic-data-access — as agentic coding compresses application lifetimes, storage's role as the stable data layer grows. Reported scale: 2M+ tables in S3 Tables today, 300B+ event notifications/day from S3, 25M+ req/s to Parquet data alone. 9 months of customer beta shaped the launch edges. Extends concepts/immutable-object-storage with a file-semantics escape hatch that preserves the object invariant rather than weakening it; concepts/simplicity-vs-velocity restated — "stage and commit gives us a surface that we can continue to evolve".)
-
2026-02-17 — sources/2026-02-17-allthingsdistributed-byron-cook-automated-reasoning-trust-ai (Werner Vogels interviews Byron Cook (Amazon Distinguished Scientist
-
VP) three and a half years after their first automated-reasoning conversation. Thesis: trust is the production blocker for generative + agentic AI, and concepts/neurosymbolic-ai — mechanical theorem provers composed with LLMs — is the path to delivering it. Two enabling forces since 2022: LLMs are now trained over theorem-prover outputs (Isabelle/HOL-light/systems/lean) which dissolves the user-friction barrier; regulated-industry customers (finance/healthcare/government) now have concrete provability demands testing cannot answer. AWS ships systems/bedrock-guardrails-automated-reasoning-checks (up to 99% provable accuracy on AI outputs vs. a customer-supplied specification — realizes patterns/post-inference-verification), and systems/bedrock-agentcore as the runtime that mechanically enforces agent capability envelopes. Together with systems/kiro (spec authoring) these form Cook's three-part patterns/envelope-and-verify: specify the envelope, AgentCore enforces it, automated reasoning proves invariants over the composition. AWS's moat: a decade of proof over the systems/aws-policy-interpreter, cryptography, networking protocols, virtualization layer — and a 2025 pan-Amazon whole-service data-flow analyzer under CISO Amy Herzog reasoning about invariants like "data at rest is encrypted" / "credentials are never logged" — all of which now extends to reasoning about agentic-tool-generated code changes. Cook predicts specification becomes mainstream: customers will discover and demand branching-time vs linear-time, past-time vs future-time, epistemic, and causal operators from spec-driven tools — see concepts/temporal-logic-specification and concepts/specification-driven-development. Autoformalization (natural-language → formal spec) is the UX bottleneck — DARPA
expMathis the public research face; Kiro + Guardrails reasoning checks are the product face. Fundamental scaling limit — NP-complete / undecidable — addressed via distributed SAT (mallob) and LLM-guided proof search. Extends concepts/lightweight-formal-verification (S3/ShardStore case) to runtime AI-output verification and organization-wide invariant enforcement; concepts/threat-modeling shape generalizes a third time (security → durability → agent envelopes). Ecosystem: DeepSeek, DeepMind/Google pushing neurosymbolic; new startups Atalanta / Axiom Math / Harmonic.fun / Leibnitz.) -
2025-10-29 — sources/2025-10-29-allthingsdistributed-what-is-ussd-and-who-cares (Werner Vogels' thesis post on systems/ussd — the early-1990s GSM stateful-session menu protocol — as the production transactional frontend for Sub-Saharan Africa mobile money and IoT retail. Canonical instances: systems/mpesa (4K TPS, real-time ML fraud detection on AWS, >$100B processed in 2024), Moniepoint (5.2B txns / $150B 2024), systems/koko-networks (700+ cloud-connected bioethanol IoT stations). Names concepts/appropriate-technology — "technology that is suitable, not shiny" — as the design doctrine behind reusing a 1990s telecom protocol at 2025 fintech scale, and frames Sub-Saharan-Africa constraints-driven design as "a blueprint to build more resilient, efficient, cost-aware systems anywhere in the world." Corollary of Warfield's concepts/simplicity-vs-velocity: invisibility of well-appropriate engineering as the highest compliment. Introduces patterns/feature-phone-frontend as the architectural shape.)
-
2025-08-06 — sources/2025-08-06-allthingsdistributed-removing-friction-sagemaker-ai-development (Werner Vogels surveys four 2025 SageMaker AI capabilities that remove distinct friction points:
StartSessionAPI — productizes SSH-over-SSM tunnels into SageMaker Studio spaces, answering SageMaker's #1 feature request, so local VS Code attaches to managed compute without bastion hosts or hand-rolled tunnels (patterns/secure-tunnel-to-managed-compute); HyperPod observability — auto-scaling collectors replace CPU-bound single-threaded ones (patterns/auto-scaling-telemetry-collector), auto-correlate high-cardinality metrics, detect grey failures — GPU thermal throttling, NIC packet loss — not just binary ones (concepts/grey-failure); explicitly framed as an answer to the observability paradox where the monitoring stack itself becomes the failure source (concepts/monitoring-paradox); HyperPod model deployment — train + serve on the same GPU cluster, collapsing the historical training/serving infra boundary (concepts/training-serving-boundary); HyperPod training operator for Kubernetes — restart only affected resources not the whole job (patterns/partial-restart-fault-recovery); monitors stalled batches + non-numeric loss; YAML-defined recovery policies.) - 2025-05-27 — sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey (Werner hosts a guest post by Sr. Principal Engineers Niko Matsakis and Marc Bowes on the engineering journey of systems/aurora-dsql: how they scaled writes without 2PC — single-journal-per-commit plus a novel Crossbar subscription router — and why DSQL moved from 100% JVM / Kotlin to 100% Rust, driven by concepts/tail-latency-at-scale math (40-host simulation: ~6K TPS vs. ~1M target, 10s tail vs. 1s) and concepts/memory-safety economics on new extension code. DSQL uses Postgres via its public extension API rather than forking. Retracts the earlier "Kotlin control plane, Rust data plane" split in favor of unified Rust.)
- 2025-03-14 — sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes (S3 at 19. Andy Warfield reframes "simple" as a property of the experience, not the API: elasticity, strong consistency, conditional writes, bucket-limit rewrite, SSD/low-latency class, and S3 Tables as the object→table-as-first-class-resource move. Canonical statement that the properties of S3 storage, not the object API, define the system.)
- 2025-02-25 — sources/2025-02-25-allthingsdistributed-building-and-operating-s3 (Andy Warfield's FAST '23 keynote, republished on ATD. The physical/operational counterpart to the 2025-03-14 "simplicity" post. HDD physics — ~120 IOPS/drive flat since 2006, 200 TB drives incoming → 1 IOPS per 2 TB. Heat management as placement problem. Aggregate demand smooths over millions of bursty tenants. Spread placement + redundancy-for-heat → single customer bursts onto 1M+ disks. Org: hundreds of microservices, "AWS ships its org chart." Durability reviews as threat-model for durability changes. ShardStore rewritten in Rust with a ~1%-size executable spec checked into the same repo → lightweight formal verification as an industrialized guardrail, SOSP paper. Ownership as a people-scaling lever — "my best ideas are the ones that other people have instead of me.")
- 2024-11-15 — sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years (The internal PR/FAQ that launched AWS Lambda, re-published at 10 years with annotations — what shipped as written, what evolved, what was deferred. Canonical artefact of Amazon's PR/FAQ doc culture.)
- 2024-08-22 — sources/2024-08-22-allthingsdistributed-continuous-reinvention-block-storage-at-aws (Marc Olson, guest post. 13-year insider retrospective on systems/aws-ebs: queueing theory framing; HDD→SSD (2012 Provisioned IOPS, 1k IOPS / 2-3ms); instrumentation turnaround; the systems/xen ring-default that capped hosts at 64 outstanding IOs; first and second systems/nitro offload cards; systems/srd replaces TCP for storage and becomes systems/ena-express for guests; custom systems/aws-nitro-ssd; the 2013 patterns/hot-swap-retrofit where SSDs were taped into every HDD server with zero disruption; patterns/nondisruptive-migration as a compounding primitive; Olson's personal shift from deep-diving-everything to patterns/peer-debugging leadership. Today: >140T ops/day, sub-ms io2 Block Express latency.)