SYSTEM Cited by 1 source
Snowflake¶
Snowflake is a cloud OLAP data warehouse. Its defining property for system-design purposes is compute–storage separation: table data lives in the cloud object layer and query compute runs in ephemeral per-tenant clusters ("virtual warehouses") that can be sized and scaled independently. That separation is what lets workloads aggregate billions of rows of source data in minutes where an equivalent OLTP query plan would need hours.
Why it shows up in architectures¶
- End-to-end recompute pipelines. With elastic compute, redoing the whole month's aggregation on each run is cheap enough to replace live-maintained incremental counters. Canva's Creators-payment pipeline uses Snowflake for exactly this: billions of usage events per month, aggregated in a few minutes per run. (Source: sources/2024-04-29-canva-scaling-to-count-billions; patterns/end-to-end-recompute)
- ELT substrate. Snowflake is a common target for DBT model DAGs — transformations expressed as SQL, intermediate stages as views, the warehouse's compute doing the heavy lift. See systems/dbt, concepts/elt-vs-etl.
- Primary data-warehousing tool for a company. Canva chose Snowflake for most regions because it was already their primary warehouse with "reliable infrastructure support" — architectural choice driven by existing data-platform operability.
Usage notes from sources¶
- Aggregation runtime at Canva scale: billions of records in a few minutes, "several orders of magnitude faster" than the MySQL round-trip approach it replaced.
- Used with an upstream replication pipeline that lands source data into Snowflake; the E + L are provided by Canva's data platform, Snowflake does the T.
- Unload is a first-class piece of the story: results must be exported back into OLTP-friendly stores via S3 + SQS for serving; Snowflake is not a low-latency serving engine. See patterns/warehouse-unload-bridge.
Caveats¶
- Not a serving tier. Query latency is seconds at best; bolt a warehouse-unload bridge in front of anything user-facing.
- DBT codebase runs on its own release cadence as a separate deployable — schema evolution in upstream source tables couples to model releases. (See systems/dbt.)
- Observability tooling for Snowflake + ELT is different from the services that feed / consume it; integration cost is real.
Seen in¶
- sources/2024-04-29-canva-scaling-to-count-billions — Canva Creators-payment pipeline: Snowflake + DBT do dedup + aggregation in SQL, over DynamoDB-sourced raw events extracted to typed columns; outer-join overwrite enables end-to-end recompute.