SYSTEM Cited by 1 source
Snap Snapchat-on-AWS architecture¶
Snap's Snapchat backend runs almost entirely on AWS, at scale disclosed at AWS re:Invent 2022 and summarized in the High Scalability Dec-2022 roundup.
Scale (disclosed)¶
- 300M+ daily active users
- 5B+ snaps/day
- 10M QPS
- 400 TB stored in DynamoDB, with nightly scans running at ~2 billion rows/minute (friend suggestions + ephemeral-data deletion)
- 900+ EKS clusters × 1000+ instances per cluster
Send path¶
client (iOS/Android)
├──> GW (Gateway service, on EKS)
│ └──> MEDIA service
│ └──> CloudFront + S3
│ (persist media close to recipient)
└──> MCS (Core Orchestration Service)
├──> Friend Graph service (permission check)
└──> SnapDB (metadata)
SnapDB is Snap's in-house database built on top of DynamoDB as its backend storage. It adds:
- transactions,
- TTL handling,
- an efficient ephemeral-data + state-synchronization model on top of DynamoDB's native primitives.
The cost-control dimension is explicit in the talk: SnapDB's abstractions over DynamoDB are "what helps control costs" at 400 TB + 2B rows/min-scan load.
Receive path (latency-sensitive)¶
sender's MCS write
──> MCS looks up recipient's persistent connection in ElastiCache
──> forward message via connection-owning server
──> client retrieves media by media-ID from CloudFront
The architecture migration to this design reported P50 latency reduction of 24% vs. the predecessor path.
Cost-optimization levers¶
- Auto-scaling (EKS-level) keeps compute aligned with the send/receive request rate.
- Instance-type optimization — explicit migration to Graviton ARM-based EC2 for the dominant services, with CPU pricing below comparable x86 SKUs.
- SnapDB abstraction over DynamoDB — allows Snap to amortize per-partition hot-path reads into ElastiCache persistent connections instead of DynamoDB GetItems.
Why it shows up on this wiki¶
Canonical example of the DynamoDB-as-scale-out-OLTP pattern at hyperscale, and of EKS + DynamoDB + CloudFront + ElastiCache as a complete architecture for "millions-of-QPS ephemeral messaging". Also a counterpoint to the Twitter / Roblox bare-metal-is-cheaper narrative circulating in the same period (sources/2022-12-02-highscalability-stuff-the-internet-says-on-scalability-for-december-2nd-2022): Snap publicly argued the cloud-native architecture is the cost-control strategy at their scale, with specific citation to Graviton optimization and the ElastiCache hot-connection lookup as the latency wins.