SYSTEM Cited by 1 source

Stack Overflow architecture¶

Overview¶

Stack Overflow as of mid-2022: a 14-year-old .NET monolith running on-prem in a single data center, serving 1.3 billion page views per month across 200 Q&A sites at ~6,000 requests/second — on 9 web servers behind HAProxy, with ~<10% CPU utilization steady-state. "In theory we could be running on a single web server" (Roberta Arcoverde, Hanselminutes interview, engineering stack overflow). Source: sources/2022-07-11-highscalability-stuff-the-internet-says-on-scalability-for-july-11th-2022.

Scale (mid-2022)¶

Metric	Value
Page views / month	1.3B
Requests / second	6,000
Web servers	9 (single app pool)
Engineering org	~50 engineers on the Q&A platform
Average render time (question page)	20 ms
SQL Server RAM per host	1.5 TB
Fraction of DB in RAM	~1/3
Anonymous traffic share	80%
CPU utilization steady-state	~<10%

Architectural choices¶

Monolith. Single .NET codebase, single deploy unit.
Multi-tenant. All 200 sites share the same app pool.
On-prem, not cloud. Hoff/Arcoverde: "we did this regular exercise where we would try to understand how much it would cost to run Stack Overflow in the cloud. And it was just never worth it."
Low-latency-first design. Optimized for fast request → few queries → fast response so the server can pick up the next one.
Low-allocations-first. Avoid object churn that would force long GC stalls on the 9-web-server footprint.
Single hops, 10 GbE internal network — latency- sensitive infrastructure decisions.
SQL Server with 1.5 TB RAM per host — a third of the DB is in memory; no Redis fragment cache.

Fragment-cache removal¶

3–4 years before the 2022 post, the Stack Overflow team removed all page-fragment caching:

"We removed all cash and that was like three or four years ago. We stopped cashing that page. We stopped caching the content and little did we know it didn't really make any measurable effects on the performance. We were still able to handle requests and send responses in super fast because of how the architecture was built at the time."

Alex Watt's takeaway (via sources/2022-07-11-highscalability-stuff-the-internet-says-on-scalability-for-july-11th-2022): "giving SQL more RAM is better than caching page fragments with Redis." A canonical data point for the "push memory down to the database" school of latency optimization.

Rolling deploys¶

"We have rolling builds. So we have those nine web servers. They are under an HAProxy front, and every time that we need to deploy a new version of the site, we take a server out of rotation, updated there, put it back on rotation." Full production deploy takes ~4 minutes (build time); revert is minutes. No microservices, no k8s, no feature-flag gymnastics.

Design thesis¶

We always start from asking the question, what problem are you trying to solve? And the problems that these tools and bandwagons try to solve are not problems that we were facing ourselves. (Arcoverde.)

Velocity-of-team-split and deploy-parallelism — the usual microservices selling points — were never bottlenecks for Stack Overflow. The monolith kept up with 14 years of growth without re-architecting.

concepts/monolith-vs-microservices-pendulum — canonical monolith-wins-at-scale datapoint.
systems/mysql — the parallel "SQL-first" DB discussion in the wider database corpus.
sources/2022-07-11-highscalability-stuff-the-internet-says-on-scalability-for-july-11th-2022.
companies/highscalability.