Skip to content

SYSTEM Cited by 1 source

Azure Cosmos DB

Azure Cosmos DB is Microsoft's globally-distributed, multi-model NoSQL database. Its internal architecture was publicly described in the 2022 InfoQ piece "Azure Cosmos DB: Low Latency and High Availability at Planet Scale", surfaced in the High Scalability Dec-2022 roundup.

Scale disclosure

One single customer runs 100M requests/sec over petabytes of storage globally distributed over 41 regions.

Internal architecture

Explicitly documented in the post:

  • NUMA-aware process model. "We never span our process across NUMA nodes. When a process is across the NUMA node, memory access latency can increase if the cache misses. Not crossing the NUMA node gives our process a more predictable performance." See concepts/numa-awareness.
  • Storage engine:
  • Lock-free B+ tree for indexing โ€” concurrent access without lock contention on the index.
  • Log-structured storage โ€” appends are cheap, reads use the B+ tree.
  • Local disk (SSDs) โ€” explicitly not remote-attached storage, because remote-attached adds network-round-trip latency to the hot path.
  • Batching to reduce both network and disk I/O.
  • Custom allocators tuned to the request patterns (not a general-purpose malloc).
  • Custom async scheduler using coroutines for the request-handling path.

Design parallels

The engine-level choices mirror Honeycomb's storage-layer design (also disclosed in the same 2022-12-02 roundup): lock-free B+ tree, log-structured on local SSD, batching, custom allocators, coroutines. Similar shape appears in FoundationDB and other hyperscaler-class OLTP engines.

The NUMA-awareness lesson is the generalizable one: hyperscaler engines budget memory latency at the CPU-socket level, not just the page-cache level.

Why it shows up on this wiki

Azure Cosmos DB is DynamoDB's closest direct competitor in the hyperscale managed-NoSQL category. The 2022 disclosure gave the first concrete engine-level architecture numbers for Cosmos, enabling fair comparison with other hyperscale OLTP systems.

Seen in

Last updated ยท 319 distilled / 1,201 read