Skip to content

CONCEPT Cited by 3 sources

Compute–storage separation

Compute–storage separation is the architectural property where a system's persistence layer and its query/compute layer are decoupled and scale independently. Storage is a durable, shared substrate (object store, shared log, distributed KV); compute is a pool of workers that can be added/removed without touching the data.

Why it matters

  • Scale the expensive resource independently. Aggregation workloads are compute-heavy but bounded by data size; you can burst compute up for the job without over-provisioning storage, and vice-versa.
  • Multiple compute tiers against one source of truth. Serving, analytics, backfill, and disaster-recovery workers can all read the same storage without replica sprawl.
  • Elastic pricing. Compute goes to zero when idle; storage bill stays flat. Aligns with a pay-per-use model (see elasticity).
  • Horizontal aggregation. An OLAP engine with separated compute can parallelise an aggregation across many workers that all read the same storage; most of the calculation ends up in memory.

Contrast with shared-nothing OLTP

Classic MySQL / Postgres setups bundle compute and storage on the same box. To scale, you grow the box (Canva doubled RDS instance size every 8–10 months until the model broke) or shard the data, which complicates the application. OLAP warehouses like Snowflake split these layers: storage lives in the cloud object layer, compute is a "virtual warehouse" cluster you size independently. Canva explicitly calls this out as the reason Snowflake can aggregate billions of records in a few minutes, "several orders of magnitude faster" than the MySQL round-trip approach. (Source: sources/2024-04-29-canva-scaling-to-count-billions)

Seen elsewhere

  • S3 + analytics engines — S3 is the durable store; engines like Spark, Trino, Athena, and Snowflake run compute over the same object data. (systems/aws-s3, systems/apache-iceberg)
  • Aurora DSQL — separates the journal (durability) from the adjudicator/crossbar/storage/execution tiers, each scaling on its own axis. (systems/aurora-dsql)
  • Lakebase — Neon-descended serverless Postgres; externalises page and WAL storage into systems/pageserver-safekeeper, leaving the Postgres compute VMs ephemeral and scale-to-zero. A cleaner example of the OLTP-shape separation than DSQL (which keeps Postgres's page/WAL layer and rewrites replication/concurrency). (systems/lakebase)
  • Lambda — stateless compute with state pushed to managed stores is the same separation at the request-level. (concepts/stateless-compute)

Caveats

  • Query latency ≠ OLTP latency. Compute is elastic but not free to spin up; cold queries can be seconds. OLAP stores aren't a serving tier — see patterns/warehouse-unload-bridge.
  • Network between layers matters. Separation implies moving data or metadata between compute and storage; engine designs work hard on caching, local-disk staging, and predicate push-down to keep that in check.
  • Consistency story gets more complex. The storage layer must give the compute layer enough consistency guarantees for correctness; S3 moving to strong read-after-write consistency in 2020 is one enabling step for this (see concepts/strong-consistency).

Seen in

Last updated · 200 distilled / 1,178 read