Skip to content

CONCEPT Cited by 1 source

OLTP vs OLAP

OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are the two big workload archetypes databases are tuned for. OLTP = many small, low-latency, mostly-single-row reads and writes with strong transactional semantics; row-oriented storage; operational systems sit on it (MySQL, Postgres, DynamoDB). OLAP = few large scans and aggregations across many rows over many columns; columnar storage; warehouse workloads sit on it (Snowflake, BigQuery, Redshift).

Why picking the wrong archetype hurts

When a pipeline's real shape is aggregate a large dataset end-to-end but it's implemented on an OLTP database, you get:

  • O(N) round-trips. A per-record loop doing ≥1 read + ≥1 write is O(N) DB queries; batching to size C gives O(N/C) which is still O(N). (Source: sources/2024-04-29-canva-scaling-to-count-billions)
  • Vertical-scale wall. OLTP engines like MySQL RDS don't auto-partition; capacity growth is instance doubling until blast radius and maintenance pain become unacceptable.
  • Incident recovery edits rows. Fixing a systematic miscount means SSHing in and surgically rewriting intermediary tables — slow, error-prone, high-cross-review cost.

Canva's Creators-payment counting pipeline hit exactly this: billions of events per month on MySQL RDS, instance-size doubling every 8–10 months, and days-long incident recoveries. Moving the calculation into an OLAP warehouse (Snowflake) with ELT (DBT) removed the round-trip bottleneck and collapsed intermediary state. (Source: sources/2024-04-29-canva-scaling-to-count-billions)

When OLTP is still right

  • The workload is actually transactional: point lookups, per-row writes, strong transactional semantics, millisecond-class read latencies.
  • Scale is below the point where vertical growth hurts — Canva explicitly defends MySQL as correct for the first 2 years of their program.
  • Low-latency serving of already-computed aggregates (which is why Canva unloads Snowflake output back into RDS for reads; see patterns/warehouse-unload-bridge).

When OLAP is right

  • Workload is scan + aggregate over large datasets.
  • End-to-end recompute is acceptable (vs. maintaining live incremental state): OLAP's compute scaling makes rerunning a month-scale query in minutes feasible. See patterns/end-to-end-recompute.
  • You can tolerate higher per-query latency (seconds-to-minutes, not milliseconds) — OLAP is not a serving tier.

The hand-off problem

OLAP warehouses are not built for high-QPS low-latency reads, so aggregated results have to be unloaded into an OLTP-friendly store for serving. Canva does this with a scheduled worker → S3 → SQS → rate-limited RDS ingester; this bridge is a first-class piece of the architecture, not a footnote. See patterns/warehouse-unload-bridge.

Seen in

Last updated · 200 distilled / 1,178 read