CONCEPT Cited by 1 source
OLTP vs OLAP¶
OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are the two big workload archetypes databases are tuned for. OLTP = many small, low-latency, mostly-single-row reads and writes with strong transactional semantics; row-oriented storage; operational systems sit on it (MySQL, Postgres, DynamoDB). OLAP = few large scans and aggregations across many rows over many columns; columnar storage; warehouse workloads sit on it (Snowflake, BigQuery, Redshift).
Why picking the wrong archetype hurts¶
When a pipeline's real shape is aggregate a large dataset end-to-end but it's implemented on an OLTP database, you get:
- O(N) round-trips. A per-record loop doing ≥1 read + ≥1 write is O(N) DB queries; batching to size C gives O(N/C) which is still O(N). (Source: sources/2024-04-29-canva-scaling-to-count-billions)
- Vertical-scale wall. OLTP engines like MySQL RDS don't auto-partition; capacity growth is instance doubling until blast radius and maintenance pain become unacceptable.
- Incident recovery edits rows. Fixing a systematic miscount means SSHing in and surgically rewriting intermediary tables — slow, error-prone, high-cross-review cost.
Canva's Creators-payment counting pipeline hit exactly this: billions of events per month on MySQL RDS, instance-size doubling every 8–10 months, and days-long incident recoveries. Moving the calculation into an OLAP warehouse (Snowflake) with ELT (DBT) removed the round-trip bottleneck and collapsed intermediary state. (Source: sources/2024-04-29-canva-scaling-to-count-billions)
When OLTP is still right¶
- The workload is actually transactional: point lookups, per-row writes, strong transactional semantics, millisecond-class read latencies.
- Scale is below the point where vertical growth hurts — Canva explicitly defends MySQL as correct for the first 2 years of their program.
- Low-latency serving of already-computed aggregates (which is why Canva unloads Snowflake output back into RDS for reads; see patterns/warehouse-unload-bridge).
When OLAP is right¶
- Workload is scan + aggregate over large datasets.
- End-to-end recompute is acceptable (vs. maintaining live incremental state): OLAP's compute scaling makes rerunning a month-scale query in minutes feasible. See patterns/end-to-end-recompute.
- You can tolerate higher per-query latency (seconds-to-minutes, not milliseconds) — OLAP is not a serving tier.
The hand-off problem¶
OLAP warehouses are not built for high-QPS low-latency reads, so aggregated results have to be unloaded into an OLTP-friendly store for serving. Canva does this with a scheduled worker → S3 → SQS → rate-limited RDS ingester; this bridge is a first-class piece of the architecture, not a footnote. See patterns/warehouse-unload-bridge.
Seen in¶
- sources/2024-04-29-canva-scaling-to-count-billions — Canva
usage-counting MySQL → OLAP (Snowflake + DBT) migration; billions/month
aggregated in minutes; >50% data reduction; pipeline latency
1 day → <1 hour.