HIGHSCALABILITY 2022-07-11 Tier 1

High Scalability — Stuff The Internet Says On Scalability For July 11th, 2022¶

Summary¶

Todd Hoff's weekly curated link roundup on highscalability.com for the week ending 2022-07-11 — the first High Scalability article ingested into this wiki and a representative example of the format. The post is not a single-topic engineering deep-dive; instead it's a field report of the interesting distributed-systems writing, operational numbers, and running arguments that circulated on Twitter, HN, and engineering blogs over the prior ~8 weeks of 2022. The dense structure — Number Stuff, Quotable Stuff, Useful Stuff, Soft Stuff, Pub Stuff — is Hoff's signature: one-liners with citations rather than synthesized essays.

For the wiki, this post's value is that it co-locates a large number of distinct production-scale data points from 2022 that would otherwise require 20+ separate ingests, and it captures the running debates of that moment — monolith vs. microservices, serverless vs. containers, cloud vs. on-prem / bare metal, GraphQL vs. REST — in the voices of practitioners who were actually paying the bills. The "Useful Stuff" section provides one-paragraph architectural distillations of several long-form posts (Stack Overflow, Pinterest memcached, Meta cloud gaming, Uber dynamic subsetting, AlloyDB columnar engine) sufficient to extract first-class wiki entries.

Key takeaways¶

Stack Overflow in mid-2022 ran 1.3B page views/month on 9 on-prem servers with a single app pool, at ~6000 req/s, 20ms avg render time on the question page, ~<10% CPU utilization — a 14-year .NET monolith with no microservices and no cloud. The database side: SQL Servers with 1.5 TB of RAM each, enough that one-third of the entire DB fits in memory. Roberta Arcoverde (Hanselminutes): "we could, in theory, be running on a single web server. We wouldn't want to do that, but theoretically it would be possible." Alex Watt: "giving SQL more RAM is better than caching page fragments with Redis." They removed their fragment cache three-four years ago and observed no measurable latency impact. (Source: systems/stack-overflow-architecture, concepts/monolith-vs-microservices-pendulum).
Pinterest's memcached fleet spans >5000 EC2 instances, serves up to ~180M requests per second and ~220 GB/s of network throughput over a ~460 TB active dataset partitioned into ~70 clusters. Two canonical efficiency wins: (a) running memcached under SCHED_FIFO real-time scheduling policy dropped P99 client-side latency 10-40% and eliminated spurious P99/P999 spikes; (b) TCP Fast Open saved one RTT on connection establishment. Additional scaling lever: extstore extending memcached's backing store from DRAM into NVMe flash, which increases per-instance storage capacity by "several orders of magnitude" and proportionally reduces cluster cost footprint. (Source: systems/pinterest-memcached-fleet).
Amazon Games' New World MMO runs at 30 Hz simulation frequency (6x typical MMO), with 2,500 players per seamless world, backed by 7,000+ AI entities and hundreds of thousands of objects per Amazon EC2 set; DynamoDB handles ~800,000 writes every 30 seconds to persist game state. The architecture moves player state between "hub instance" simulation nodes spread across AWS Regions as players traverse the world — effectively sharding the world by geography rather than by server pool. At peak in opening weekend the game hit >900,000 concurrent users on Steam, fifth-highest in Steam's history. (Source: systems/new-world-amazon-games).
Meta's cloud gaming infrastructure uses edge-deployed NVIDIA Ampere GPUs managed by Twine (Meta's cluster manager), with Windows- and Android-game containers rendered on the GPU, frames captured and encoded via GPU hardware encoders, packetized into WebRTC/SRTP (UDP) and streamed to player clients with a jitter buffer. Edge deployment in metro-area PoPs close to population centers — not the full streaming pipeline on GPU — is the biggest latency win. (Source: sources/2022-07-11-highscalability-stuff-the-internet-says-on-scalability-for-july-11th-2022 summarizing Meta's engineering post.)
Meta "Transparent Memory Offloading" (TMO) saves 20-32% of DRAM per server across millions of data-center servers by kernel/hypervisor-level detection of cold memory pages and migrating them to NVMe-SSD-backed swap — a cheaper and lower-power storage tier. LinkedIn reports a parallel approach via RDMA-over-Converged-Ethernet (RoCE) for datacenter block storage, exposing remote DRAM/SSD at near-local latencies. The combined pattern: DRAM is not the only latency tier worth treating as memory; tiered storage with page-level transparent migration recovers a sizable fraction of the most expensive capacity. (Source: concepts/transparent-memory-offloading).
Uber's dynamic subsetting for service-mesh load balancing auto-tunes the aperture (the subset size an on-host proxy balances across) based on the ratio of per-service load the host contributes to the destination's total. Before the change, 8 manually-tuned large services reported 15-30% P99 CPU utilization reduction; since the rollout (18-12 months prior to the post), service owners have reported zero subsetting-related issues. (Source: patterns/dynamic-subsetting-load-balancer).
AWS Graviton3 (c7g EC2) targets compute density over clock speed: 2.6 GHz (only 100 MHz above Graviton2), 5nm process, three chips per node. The trade-off: low per-core power on a modern process, packed densely, priced per-vCPU below comparable Intel/AMD offerings — plus reported wins on real workloads: Formula 1 CFD +40% vs C6gn, Honeycomb.io telemetry ingestion +35% throughput with -30% latency vs C6g. (Source: systems/graviton3).
Serverless means per-request billing, not event-driven or stateless. Simon Willison's definition crystallizes the debate: "Serverless means per-request billing is my new favorite definition of that term. I'd previously been using 'scale to zero' but that always requires extra explanation." The operational consequence: Serverless Aurora v2's inability to auto-pause makes Jeremy Daly argue it "fundamentally misses the mark of true serverless(ness)" — the thing you actually pay for is not having to pay for idle. Datadog's State of Serverless 2022 shows continued growth regardless of whether the definitional fight is settled. (Source: concepts/serverless-billing-definition, concepts/per-request-billing).
The monolith-vs-microservices pendulum has swung in both directions within one company. Airbnb went monolith (2008-2017, ~$2.6B revenue) → microservices (2017-2020, split from monorepo, ~$5B revenue, cross-cutting feature complexity, Thrift → GraphQL unified data access layer) → "micro + macroservices" (2020-present, unified APIs + central data aggregator + service-block facade APIs). Parallel datapoint: Shopify still runs a Ruby monolith at scale (deconstructing "to maximize developer productivity" within the monolith, not decomposing into services). Wave ($1.7B company, 70 engineers) runs "a Python monolith on top of Postgres." Steven Lemon: "rather than separate our monolith into separate services, we started to break our solution into separate projects within the existing monolith." The pattern across sources: right-sizing services is a continuous problem, not a destination; monoliths can be properly service-based on the inside without being distributed on the outside. (Source: concepts/monolith-vs-microservices-pendulum).
Bare-metal cloud alternatives (Equinix Metal, OVH, co-lo) are being adopted by growth-stage startups on cost grounds. Martin Casado reported three $20M+ ARR startups using Equinix bare-metal with K8s; Manish Jain (Dgraph): "2 month rental of an AWS server = outright purchase of an equivalent server." Alex Saroyan: 40 racks across 5 locations saved a customer 90% on egress traffic alone vs. public cloud. The trade-off is slower procurement and no on-demand scaling. James Hamilton's companion number: "CPU still represents about 32 percent of the IT equipment power budget, memory only burning 14 percent, peripheral costs around 20 percent, motherboard around 10 percent, disk drives 5 percent."
Cascading failures follow a positive feedback loop: one node's overload spreads load to the remaining nodes, increasing their probability of failure, which spreads more load — the vicious circle. Mitigation toolkit from the HDM Stuttgart post: add resources; avoid health-check-caused deaths; restart thread-blocked servers; drop traffic significantly and ramp back; enter degraded mode by dropping traffic classes; eliminate batch/bad traffic; move from orchestration to choreography (pub/sub). (Source: concepts/cascading-failure).
Stack Overflow's DDoS playbook emphasizes preventing the attack from reaching the expensive SQL queries: require authentication on every API call (for attribution), minimize response data per call, rate-limit all APIs, filter malicious traffic before it hits the application, block weird URLs, auto-populate IP blocklists, and tar-pit botnet sources to slow volume attacks. A DDoS on the Q&A monolith is effectively a DDoS on SQL Server; filtering upstream is what keeps the database alive.
Long-term archival of high-fidelity media saves money by re-encoding to a lower-fidelity format before the long-tail retention tier. AWS's Amazon-Connect call-recording example: downsample and re-encode the WAV recording before transitioning to cold storage, retaining intelligibility for compliance lookups but shedding 5-10x on storage cost. (Source: patterns/downsample-recode-long-term-archive).
Single-server architectures remain viable for workloads most builders think require distribution. Hacker News (@HNStatus): "For the record, HN is a single process on a single server. I can't snap my fingers and magically make it redundant." Ben Schwarz (Calibre App): "for a long while @Calibreapp handled tens of millions of API requests per month on a $7 heroku dyno." Chris Munns (AWS): "99% of apps never break 1000 rps. Even with that you can likely serve millions of active users in a month." Pattern: the dominant failure mode of modern distributed systems is premature distribution, not under-provisioning.

Operational numbers captured¶

From the Number Stuff section and the body commentary:

Stack Overflow: 1.3B page views / month; 6,000 req/s on 9 servers; 20ms render P50; SQL Server 1.5 TB RAM per host; 80% anonymous traffic; <10% web-tier utilization.
Pinterest memcached: 5,000+ EC2 instances; ~180M req/s; ~220 GB/s network; ~460 TB active dataset; ~70 clusters.
Amazon S3: 200 trillion objects stored (~29,000 per person on Earth); >100M req/s averaged; 250,000× growth in 16 years.
Netflix real-time data: 20 trillion events/day (2021).
Riot Games: 20+ Kafka shards; 500,000 events/sec peak; 8 TB/day generated; on-prem Kafka buffering, AWS on the backend.
Ably: <65ms P99 round-trip latency from any of 205 PoPs that each receive ≥1% of their traffic.
Uber dynamic subsetting: 15-30% P99 CPU reduction on 8 manually-tuned large services; zero complaints in 18-12 months since fleetwide rollout.
Meta TMO: 20-32% DRAM savings per server, applied across "millions of servers."
AWS Lambda economics (Brian LeRoux): $1 per 6M invocations. The point: stateless on-demand is a utilization feature ("provisioned server capacity is not 100% utilization").
Bitcoin network: 100 TWh of electricity in 2021 — more than Finland's annual energy budget.
EFS read latency: 600µs (vs. 1956 IBM 350 disk read latency of 600ms — 1000× in ~66 years).
Database benchmark ratios (Redis vs KeyDB vs Dragonfly vs Skytable, write/read req/s): 112K/99K · 289K/283K · 408K/392K · 620K/676K.
Concurrent connections: 1,000,000 concurrent TCP connections on a single tuned host is achievable.
AWS Lambda runaway cost story (huksley on HN): recursive self-calling Lambda with 30s timeout ran for 24 hours, generated 70M GB-seconds, $1,484 bill before being caught.

Caveats¶

Tier 1 source but roundup format — individual claims carry the credibility of their original source, not the aggregator. Quotes from Twitter/HN are uneven in rigor.
Dated 2022-07-11 — some numbers and running debates are now stale (e.g. Datadog's 2022 serverless-state report; Azure vs AWS market-share comparisons; Graviton3 was new).
Many one-liners are opinion / personality-driven — Brian LeRoux on Lambda, Werner Vogels on serverless, Jeremy Daly on MongoDB Atlas serverless, etc. — cite the specific quote + source rather than generalizing.
The Airbnb microservices narrative is secondhand (Hoff paraphrasing a Medium post) and contains the classic "you weren't doing it right" self-fulfilling-prophecy framing; treat the before/after numbers as directional.
Stack Overflow's 1.3B page views figure is from Roberta Arcoverde's Hanselminutes interview; older public numbers (2014) were in a similar ballpark, so the post reinforces that a 14-year monolith has kept pace without re-architecting.
The GraphQL criticism threads (Rick Houlihan on single-table design, jmhodges on "GraphQL is a trap") are running debates, not settled architectural consensus — cite only the specific quote when needed.

High Scalability — Stuff The Internet Says On Scalability For July 11th, 2022¶

Summary¶

Key takeaways¶

Operational numbers captured¶

Caveats¶

Source¶

Related¶