All Things Distributed: AWS Lambda turns 10 — a rare look at the PR/FAQ that started it¶
Summary¶
Werner Vogels publishes the (lightly edited) internal PR/FAQ that launched AWS Lambda in 2014, with 2024-era annotations showing which ideas shipped as originally written, which evolved (1ms billing, 10GB memory, container packaging up to 10GB image, multi-runtime support), and which were intentionally deferred at launch. Doubles as (a) a primary-source snapshot of the serverless compute model as Amazon framed it to itself and (b) a case study of Amazon's PR/FAQ narrative-doc practice driving a foundational service. Strongest architectural signal: the stated tenets, the "single-tenant EC2 at launch → Firecracker micro-VMs later" isolation evolution, and the design commitment to scale-to-zero + per-invocation billing from day one.
Key takeaways¶
- Lambda was framed as a primitive that collapses a pattern Amazon kept observing. Customers had "entire EC2 fleets sitting idle, waiting to run simple functions like writing to a database or processing a file." The PR/FAQ treats Lambda as the productization of that observed workaround — a service that turns "any code into a secure, reliable and highly available cloud service with a web accessible end point within seconds," so developers "focus on application logic, not on undifferentiated heavy lifting of provisioning servers, autoscaling, patching software, and logging." This is the canonical articulation of concepts/serverless-compute.
- Six explicit tenets codify the design. Security without complexity · Simple and easy ("NoOps") · Scales up and down (to zero) (architected for "one application invocation per month and 1,000 per second" on the same code) · Cost effective at any scale ("fine-grained pay-for-use; developers will not pay for idle time") · AWS integration · Reliable (public latency/availability targets, internal higher bar). Tenets as a forcing function for decision-making mid-evolution — "you'll find them in almost every doc." See patterns/pr-faq-writing.
- Billing granularity as a customer-alignment lever, tightened over time. PR/FAQ proposed 250ms execution billing; launch shipped at 100ms; today it's 1ms, no minimums. Same story for memory: doc said up to 1 GB of virtual memory per function; today 10 GB. Core design rule — "customers cannot overprovision or underutilize by design: customers utilize 100% of the computing power they're paying for when they run an application" — see concepts/fine-grained-billing.
- Scale-to-zero was a day-one architectural commitment, not an emergent property. "Infrequent or periodic jobs are cost effective, sharing capacity with other users and only charging for actual execution time… 'Lambda is cost effective at one request or thousands per second, all with the same customer code.'" And: "no warm-up or cool-down periods or charges." See concepts/scale-to-zero.
- Isolation evolved from single-tenant EC2 → Firecracker micro-VMs. "Until Firecracker, we used single tenant EC2 instances. No two customers shared an instance. And this was expensive, but we knew that long-term that it was a problem we could solve… These days, we can securely pack thousands of micro VMs onto a single bare metal instance." Frames systems/firecracker as the cost/density unlock that paid off the day-one "security is not negotiable" trade — see concepts/micro-vm-isolation.
- Deliberate language-launch minimalism. "We made the hard decision to only launch with support for Node, which was popular at the time (and the language of choice for an internal team that relied on the service). This allowed us to validate our programming model, observe how customers were using Lambda, and use those learnings as we added support for additional runtimes." Runtime rollout: Node (Nov 2014) → Java (Jun 2015) → Python (Oct 2015) → .NET (Dec 2016) → Go (Jan 2018) → Ruby (Nov 2018) → Custom Runtimes (Nov 2018). Pattern: patterns/launch-minimal-runtime.
- Cold starts were flagged from day one and have been a continuous investment. Original FAQ: "Applications in steady use have typical latencies in the range of 20-50ms… Latency will be higher the first time an application is deployed and when an application has not been used recently." 2022: SnapStart (built on Firecracker snapshotting) cut Java cold starts by up to 90%. See concepts/cold-start.
- Stateless by contract. "Applications hosted by Lambda are stateless; persistent state should be stored in Amazon S3, Amazon DynamoDB, or another Internet-available storage service… Local file system access is intended as a temporary scratch space and is deleted between invocations." This is what makes the placement/scale-to-zero story possible. See concepts/stateless-compute.
- Placement engine as the compaction strategy that makes the pricing model work. "Each new request is placed with respect to minimizing the number of instances dedicated to that account (subject to maintaining latency, throughput, and availability goals). Spiky workloads, heterogeneous workloads, and short-lived jobs such as cron or batch applications all use capacity efficiently." Internally measured via per-host paging rates, CPU utilization, network bandwidth — sustained high signals trigger more EC2 capacity or a fleet-mix shift.
- Annotated deltas between the doc and reality. Things in the doc
that didn't ship as described or shipped much later: ZIP-only
packaging (container images up to 10 GB added 2020); S3
SetUpdateHandlerand DynamoDBSetTableHandlerAPIs (became S3 event notifications and DynamoDB Streams triggers — "became much simpler than was originally laid out"); "scheduled batch jobs" via SWF crontab (became EventBridge Scheduler + Lambda integrations); 4-hour max wall-clock duration mentioned as a tradeoff (today 15 minutes for Lambda specifically, with longer jobs sent to Step Functions / Batch / Fargate); Simple Deployment Service for Git-based deploys never materialized as named; Lambda Layers (2018) solved the "native library + reusable deps" problem that the doc punted on. The document itself is explicit that "we are not omniscient, and the goal of writing is not perfection."
Architectural diagrams / numbers¶
- Launch (Nov 2014): Node-only, ZIP upload, 100ms billing granularity, 1 GB virtual memory max, 30s synchronous HTTP request limit, ~4h batch/timed-job wall-clock ceiling, 99.99% availability target.
- Today (2024): 1ms billing, 10 GB memory, 10 GB container image support, SnapStart, multi-runtime + custom runtimes, Firecracker-based multi-tenant micro-VM density.
- Claim: "thousands of micro VMs onto a single bare metal instance" (see systems/firecracker / Amazon Science post linked in the annotation).
Caveats¶
- This is a republished internal doc + retrospective annotations, not a technical deep-dive on Lambda's current control plane. Architecture statements reflect 2014 planning, annotated with 2024 outcomes — no internal block diagram of how the invoke plane, placement engine, or Firecracker fleet are actually wired today.
- The customer testimonial blocks in the PR are templated ("CTO of XXX") — they are aspirational, not actual customers.
- Prices are
$XXX/$YYY/WWW GBplaceholders in the original — the final launch economics differ. - Useful primarily as a culture / process artifact (patterns/pr-faq-writing) and a day-one tenets snapshot. For current Lambda internals, pair with Marc Brooker's USENIX ATC '23 talk on on-demand container loading and the published Firecracker paper.
Links¶
- Raw:
raw/allthingsdistributed/2024-11-15-aws-lambda-prfaq-after-10-years-bccf6145.md - Original: https://www.allthingsdistributed.com/2024/11/aws-lambda-turns-10-a-rare-look-at-the-doc-that-started-it.html
- Companion: PDF of unannotated PR/FAQ at
/files/aws-lambda-prfaq-2014.pdfon allthingsdistributed.com - Companion: Marc Brooker, "On-demand container loading in AWS Lambda," USENIX ATC '23 (linked in the post)
- Companion: "How AWS's Firecracker virtual machines work," amazon.science (linked in the post)