SYSTEM Cited by 1 source
Azure Cosmos DB¶
Azure Cosmos DB is Microsoft's globally-distributed, multi-model NoSQL database. Its internal architecture was publicly described in the 2022 InfoQ piece "Azure Cosmos DB: Low Latency and High Availability at Planet Scale", surfaced in the High Scalability Dec-2022 roundup.
Scale disclosure¶
One single customer runs 100M requests/sec over petabytes of storage globally distributed over 41 regions.
Internal architecture¶
Explicitly documented in the post:
- NUMA-aware process model. "We never span our process across NUMA nodes. When a process is across the NUMA node, memory access latency can increase if the cache misses. Not crossing the NUMA node gives our process a more predictable performance." See concepts/numa-awareness.
- Storage engine:
- Lock-free B+ tree for indexing โ concurrent access without lock contention on the index.
- Log-structured storage โ appends are cheap, reads use the B+ tree.
- Local disk (SSDs) โ explicitly not remote-attached storage, because remote-attached adds network-round-trip latency to the hot path.
- Batching to reduce both network and disk I/O.
- Custom allocators tuned to the request patterns (not
a general-purpose
malloc). - Custom async scheduler using coroutines for the request-handling path.
Design parallels¶
The engine-level choices mirror Honeycomb's storage-layer design (also disclosed in the same 2022-12-02 roundup): lock-free B+ tree, log-structured on local SSD, batching, custom allocators, coroutines. Similar shape appears in FoundationDB and other hyperscaler-class OLTP engines.
The NUMA-awareness lesson is the generalizable one: hyperscaler engines budget memory latency at the CPU-socket level, not just the page-cache level.
Why it shows up on this wiki¶
Azure Cosmos DB is DynamoDB's closest direct competitor in the hyperscale managed-NoSQL category. The 2022 disclosure gave the first concrete engine-level architecture numbers for Cosmos, enabling fair comparison with other hyperscale OLTP systems.