Skip to content

All Things Distributed (Werner Vogels)

All Things Distributed is the personal engineering blog of Werner Vogels, CTO of Amazon.com. Tier-1 on the sysdesign-wiki: canonical commentary on AWS service history, distributed-systems culture (PR/FAQs, tenets, Working Backwards), and the engineering rationale behind foundational AWS primitives. Content often includes primary-source material — internal documents, talks, and first-person retrospectives — not available elsewhere.

Scope and why Tier 1

  • Direct CTO-level perspective on design decisions at AWS scale.
  • Cross-references: every major AWS foundational paper/product (Dynamo, S3, Lambda, DynamoDB, EC2) gets a retrospective here.
  • Companion to AWS re:Invent keynotes; "behind the curtain" material on how Amazon makes architectural decisions (narrative docs, tenets, working backwards).
  • When a claim about AWS architecture is contested, an allthingsdistributed post is often the definitive source.
  • Occasionally publishes guest posts from AWS Distinguished Engineers (e.g. Andy Warfield on S3) with primary-source technical detail not available elsewhere.

Key systems (as referenced from this blog)

  • systems/aws-ebs — network block storage for EC2 (2008); HDD→SSD→Nitro→SRD→custom-SSD arc; >140T ops/day today; sub-ms io2 Block Express.
  • systems/nitro — AWS offload card + lightweight hypervisor family; VPC, EBS, encryption all moved off Xen dom0.
  • systems/srd — data-center transport that replaces TCP for storage; multi-path, out-of-order, offload-friendly; also systems/ena-express for guest TCP.
  • systems/aws-nitro-ssd — AWS custom SSDs, EBS-tailored.
  • systems/physalia — EBS's config/control plane; "removes the control plane from the IO path."
  • systems/xen — EC2's pre-Nitro hypervisor; its defaults capped hosts at 64 outstanding IOs.
  • systems/aws-s3 — object storage; 19-year retrospective (2025) on simplicity as an architectural property, feature arcs (consistency, conditional writes, bucket limits, Tables); and FAST '23 keynote (2025-02-25) on physical/operational scale (HDD physics, heat management, ShardStore, durability reviews, ownership).
  • systems/shardstore — S3's rewritten per-disk storage layer (Rust
  • executable-spec lightweight formal verification).
  • systems/s3-tables — managed-Iceberg first-class table resource (re:Invent 2024).
  • systems/s3-vectors — elastic similarity-search indices as a first-class S3 primitive (re:Invent 2025); S3-object-like cost/performance/durability profile; hundreds → billions of vectors.
  • systems/s3-files — NFS mount over any S3 bucket/prefix (2026-04-07); EFS-backed filesystem presentation; stage-and-commit translation to S3 objects; design-breakthrough origin of concepts/boundary-as-feature.
  • systems/aws-efs — under-the-covers filesystem backing for S3 Files.
  • systems/s3-express-one-zone — SSD, single-AZ latency tier (2023).
  • systems/metabucket — S3's bucket-metadata subsystem.
  • systems/aws-crt — Common Runtime; S3 client best-practice library.
  • systems/apache-iceberg — open table format; the pattern S3 Tables absorbed.
  • systems/apache-parquet — columnar on-object file format.
  • systems/aws-lambda — serverless compute service; launch PR/FAQ published here at 10 years.
  • systems/firecracker — Lambda's micro-VM isolation primitive; density unlock for multi-tenant serverless.
  • systems/aurora-dsql — serverless distributed SQL (re:Invent 2024); single-journal-per-commit + Crossbar subscription router; 100% JVM → 100% Rust journey.
  • systems/postgresql — DSQL extends Postgres via public extension API rather than forking.
  • systems/aws-sagemaker-ai — AWS's unified managed ML platform (2017 launch); umbrella for Studio, spaces, notebooks, managed training, hosting, and HyperPod.
  • systems/aws-sagemaker-hyperpod — SageMaker's large-scale distributed-training / inference compute substrate; surface for observability + model-deployment + training-operator changes (2025).
  • systems/aws-systems-manager — SSM Session Manager is the substrate under SageMaker AI's StartSession SSH-over-SSM tunnel.
  • systems/ussd — 1990s GSM stateful session-based menu protocol; 2G, no data plan, $20 feature phones; Werner's Oct 2025 thesis-post entity.
  • systems/mpesa — Safaricom mobile-money platform on AWS; 4K TPS, real-time ML fraud detection, >$100B processed in 2024; introduced in Werner's USSD post as the flagship patterns/feature-phone-frontend instance.
  • systems/koko-networks — Sub-Saharan bioethanol cooking-fuel IoT network; 700+ cloud-connected KOKOpoint stations; same USSD/ feature-phone customer edge applied to physical-goods retail.
  • systems/bedrock-guardrails-automated-reasoning-checks — Bedrock capability that verifies AI outputs against a customer-authored specification; up to 99% provable accuracy; finance / healthcare / government target industries.
  • systems/bedrock-agentcore — AWS agent runtime for mechanically enforcing capability envelopes on agentic systems; the enforcement half of patterns/envelope-and-verify.
  • systems/kiro — AWS's specification-driven development tool; flagship surface for agentic coding + formal proof combined.
  • systems/lean — interactive theorem prover founded and led by Leo de Moura (at Amazon); named by Cook as the most promising AI-reliability development; DeepSeek combines Lean + RL.
  • systems/aws-policy-interpreter — decade of automated-reasoning proof over IAM / Cedar semantics; proofs now extend to agent- generated policy changes.

Key patterns / concepts

Recent articles

  • 2026-04-07 — sources/2026-04-07-allthingsdistributed-s3-files-and-the-changing-face-of-s3 (Andy Warfield guest post, introduced by Werner Vogels. Launch of systems/s3-files — NFS mount over any S3 bucket/prefix, backed by EFS, accessible from EC2 / containers / Lambda. Most of the post is the design story: six months of attempted "EFS3" convergence in 2024 produced a "battle of unpalatable compromises"; post-Christmas- 2024 the team inverted the goal — the boundary between file and object semantics IS the feature, not a limitation to hide. Origin and canonical articulation of concepts/boundary-as-feature ("we spent months trying to make it disappear, and when we finally accepted it as a first-class element of the system, everything got better"). Architecture: concepts/stage-and-commit translation layer — file-side changes accumulate in EFS, commit back to S3 as one PUT per changed object roughly every 60 seconds; bidirectional sync; conflict policy: S3 wins, filesystem-side loser → lost+found + CloudWatch metric. concepts/lazy-hydration — first access imports S3 metadata as background scan, files < 128 KB co-hydrate data, larger files hydrate on read; 30-day idle eviction keeps active working set proportional. Read bypass reroutes high-throughput sequential reads off NFS to parallel direct-GETs against S3 — 3 GB/s per client, Tbps across many clients. Enumerates five axes of concepts/file-vs-object-semantics asymmetry (mutation granularity / atomicity / auth / namespace / performance) more exhaustively than any prior AWS source. Multiphase- not-concurrent insight: "very few applications use both file and object interfaces concurrently on the same data at the same instant." Known edges called out: rename is O(objects) (warning > 50M objects mount), no programmatic explicit-commit API at launch, some S3 keys aren't valid POSIX filenames. Multi-primitive lineage: S3 Files is the third new first-class data primitive added to S3 after systems/s3-tables (re:Invent 2024) and systems/s3-vectors (re:Invent 2025), following the patterns/presentation-layer-over-storage pattern. Named framing of concepts/agentic-data-access — as agentic coding compresses application lifetimes, storage's role as the stable data layer grows. Reported scale: 2M+ tables in S3 Tables today, 300B+ event notifications/day from S3, 25M+ req/s to Parquet data alone. 9 months of customer beta shaped the launch edges. Extends concepts/immutable-object-storage with a file-semantics escape hatch that preserves the object invariant rather than weakening it; concepts/simplicity-vs-velocity restated — "stage and commit gives us a surface that we can continue to evolve".)

  • 2026-02-17 — sources/2026-02-17-allthingsdistributed-byron-cook-automated-reasoning-trust-ai (Werner Vogels interviews Byron Cook (Amazon Distinguished Scientist

  • VP) three and a half years after their first automated-reasoning conversation. Thesis: trust is the production blocker for generative + agentic AI, and concepts/neurosymbolic-ai — mechanical theorem provers composed with LLMs — is the path to delivering it. Two enabling forces since 2022: LLMs are now trained over theorem-prover outputs (Isabelle/HOL-light/systems/lean) which dissolves the user-friction barrier; regulated-industry customers (finance/healthcare/government) now have concrete provability demands testing cannot answer. AWS ships systems/bedrock-guardrails-automated-reasoning-checks (up to 99% provable accuracy on AI outputs vs. a customer-supplied specification — realizes patterns/post-inference-verification), and systems/bedrock-agentcore as the runtime that mechanically enforces agent capability envelopes. Together with systems/kiro (spec authoring) these form Cook's three-part patterns/envelope-and-verify: specify the envelope, AgentCore enforces it, automated reasoning proves invariants over the composition. AWS's moat: a decade of proof over the systems/aws-policy-interpreter, cryptography, networking protocols, virtualization layer — and a 2025 pan-Amazon whole-service data-flow analyzer under CISO Amy Herzog reasoning about invariants like "data at rest is encrypted" / "credentials are never logged" — all of which now extends to reasoning about agentic-tool-generated code changes. Cook predicts specification becomes mainstream: customers will discover and demand branching-time vs linear-time, past-time vs future-time, epistemic, and causal operators from spec-driven tools — see concepts/temporal-logic-specification and concepts/specification-driven-development. Autoformalization (natural-language → formal spec) is the UX bottleneck — DARPA expMath is the public research face; Kiro + Guardrails reasoning checks are the product face. Fundamental scaling limit — NP-complete / undecidable — addressed via distributed SAT (mallob) and LLM-guided proof search. Extends concepts/lightweight-formal-verification (S3/ShardStore case) to runtime AI-output verification and organization-wide invariant enforcement; concepts/threat-modeling shape generalizes a third time (security → durability → agent envelopes). Ecosystem: DeepSeek, DeepMind/Google pushing neurosymbolic; new startups Atalanta / Axiom Math / Harmonic.fun / Leibnitz.)

  • 2025-10-29 — sources/2025-10-29-allthingsdistributed-what-is-ussd-and-who-cares (Werner Vogels' thesis post on systems/ussd — the early-1990s GSM stateful-session menu protocol — as the production transactional frontend for Sub-Saharan Africa mobile money and IoT retail. Canonical instances: systems/mpesa (4K TPS, real-time ML fraud detection on AWS, >$100B processed in 2024), Moniepoint (5.2B txns / $150B 2024), systems/koko-networks (700+ cloud-connected bioethanol IoT stations). Names concepts/appropriate-technology — "technology that is suitable, not shiny" — as the design doctrine behind reusing a 1990s telecom protocol at 2025 fintech scale, and frames Sub-Saharan-Africa constraints-driven design as "a blueprint to build more resilient, efficient, cost-aware systems anywhere in the world." Corollary of Warfield's concepts/simplicity-vs-velocity: invisibility of well-appropriate engineering as the highest compliment. Introduces patterns/feature-phone-frontend as the architectural shape.)

  • 2025-08-06 — sources/2025-08-06-allthingsdistributed-removing-friction-sagemaker-ai-development (Werner Vogels surveys four 2025 SageMaker AI capabilities that remove distinct friction points: StartSession API — productizes SSH-over-SSM tunnels into SageMaker Studio spaces, answering SageMaker's #1 feature request, so local VS Code attaches to managed compute without bastion hosts or hand-rolled tunnels (patterns/secure-tunnel-to-managed-compute); HyperPod observability — auto-scaling collectors replace CPU-bound single-threaded ones (patterns/auto-scaling-telemetry-collector), auto-correlate high-cardinality metrics, detect grey failures — GPU thermal throttling, NIC packet loss — not just binary ones (concepts/grey-failure); explicitly framed as an answer to the observability paradox where the monitoring stack itself becomes the failure source (concepts/monitoring-paradox); HyperPod model deployment — train + serve on the same GPU cluster, collapsing the historical training/serving infra boundary (concepts/training-serving-boundary); HyperPod training operator for Kubernetes — restart only affected resources not the whole job (patterns/partial-restart-fault-recovery); monitors stalled batches + non-numeric loss; YAML-defined recovery policies.)

  • 2025-05-27 — sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey (Werner hosts a guest post by Sr. Principal Engineers Niko Matsakis and Marc Bowes on the engineering journey of systems/aurora-dsql: how they scaled writes without 2PC — single-journal-per-commit plus a novel Crossbar subscription router — and why DSQL moved from 100% JVM / Kotlin to 100% Rust, driven by concepts/tail-latency-at-scale math (40-host simulation: ~6K TPS vs. ~1M target, 10s tail vs. 1s) and concepts/memory-safety economics on new extension code. DSQL uses Postgres via its public extension API rather than forking. Retracts the earlier "Kotlin control plane, Rust data plane" split in favor of unified Rust.)
  • 2025-03-14 — sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes (S3 at 19. Andy Warfield reframes "simple" as a property of the experience, not the API: elasticity, strong consistency, conditional writes, bucket-limit rewrite, SSD/low-latency class, and S3 Tables as the object→table-as-first-class-resource move. Canonical statement that the properties of S3 storage, not the object API, define the system.)
  • 2025-02-25 — sources/2025-02-25-allthingsdistributed-building-and-operating-s3 (Andy Warfield's FAST '23 keynote, republished on ATD. The physical/operational counterpart to the 2025-03-14 "simplicity" post. HDD physics — ~120 IOPS/drive flat since 2006, 200 TB drives incoming → 1 IOPS per 2 TB. Heat management as placement problem. Aggregate demand smooths over millions of bursty tenants. Spread placement + redundancy-for-heat → single customer bursts onto 1M+ disks. Org: hundreds of microservices, "AWS ships its org chart." Durability reviews as threat-model for durability changes. ShardStore rewritten in Rust with a ~1%-size executable spec checked into the same repo → lightweight formal verification as an industrialized guardrail, SOSP paper. Ownership as a people-scaling lever — "my best ideas are the ones that other people have instead of me.")
  • 2024-11-15 — sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years (The internal PR/FAQ that launched AWS Lambda, re-published at 10 years with annotations — what shipped as written, what evolved, what was deferred. Canonical artefact of Amazon's PR/FAQ doc culture.)
  • 2024-08-22 — sources/2024-08-22-allthingsdistributed-continuous-reinvention-block-storage-at-aws (Marc Olson, guest post. 13-year insider retrospective on systems/aws-ebs: queueing theory framing; HDD→SSD (2012 Provisioned IOPS, 1k IOPS / 2-3ms); instrumentation turnaround; the systems/xen ring-default that capped hosts at 64 outstanding IOs; first and second systems/nitro offload cards; systems/srd replaces TCP for storage and becomes systems/ena-express for guests; custom systems/aws-nitro-ssd; the 2013 patterns/hot-swap-retrofit where SSDs were taped into every HDD server with zero disruption; patterns/nondisruptive-migration as a compounding primitive; Olson's personal shift from deep-diving-everything to patterns/peer-debugging leadership. Today: >140T ops/day, sub-ms io2 Block Express latency.)
Last updated · 200 distilled / 1,178 read