Skip to content

CONCEPT Cited by 2 sources

OLTP vs OLAP

OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are the two big workload archetypes databases are tuned for. OLTP = many small, low-latency, mostly-single-row reads and writes with strong transactional semantics; row-oriented storage; operational systems sit on it (MySQL, Postgres, DynamoDB). OLAP = few large scans and aggregations across many rows over many columns; columnar storage; warehouse workloads sit on it (Snowflake, BigQuery, Redshift).

Why picking the wrong archetype hurts

When a pipeline's real shape is aggregate a large dataset end-to-end but it's implemented on an OLTP database, you get:

  • O(N) round-trips. A per-record loop doing ≥1 read + ≥1 write is O(N) DB queries; batching to size C gives O(N/C) which is still O(N). (Source: sources/2024-04-29-canva-scaling-to-count-billions)
  • Vertical-scale wall. OLTP engines like MySQL RDS don't auto-partition; capacity growth is instance doubling until blast radius and maintenance pain become unacceptable.
  • Incident recovery edits rows. Fixing a systematic miscount means SSHing in and surgically rewriting intermediary tables — slow, error-prone, high-cross-review cost.

Canva's Creators-payment counting pipeline hit exactly this: billions of events per month on MySQL RDS, instance-size doubling every 8–10 months, and days-long incident recoveries. Moving the calculation into an OLAP warehouse (Snowflake) with ELT (DBT) removed the round-trip bottleneck and collapsed intermediary state. (Source: sources/2024-04-29-canva-scaling-to-count-billions)

The OLTP/OLAP axis applied to graph workloads

The same archetype split applies to graph databases — and is in fact the load-bearing decision that determines whether a graph store can hit OLTP-class throughput targets. The [[sources/2026-05-29-netflix-high-throughput-graph-abstraction-at-netflix-part-i|2026-05-29 Netflix Graph Abstraction Part-I post]] frames it explicitly:

"OLAP: These use cases typically involve open-ended and algorithmic exploration of large graph datasets. They often utilize industry-standard models and languages such as RDF with SPARQL, Property Graphs with Gremlin or openCypher, and even SQL. The primary focus in these situations is in-depth analysis, rather than achieving high throughput and low latency.

OLTP: These use cases require extremely high throughput — up to millions of operations per second — while delivering traversal results within milliseconds. Achieving such a level of performance often requires making trade-offs, which can include accepting eventual consistency or restricting query complexity. For example, the service can demand a specified starting point for traversals and enforce a maximum traversal depth."

The OLTP-graph trade-off chart, verbatim from the post:

OLTP-graph trade-off What it buys
Mandatory traversal start node Bounded fanout from a known anchor
Maximum traversal depth Bounded query latency
Eventual consistency Multi-region async replication; no quorum tax
Restricted query language Schema-aware planner; no Gremlin runtime cost

OLAP-graph workloads (RDF/SPARQL exploration, full-graph algorithmic analysis) by contrast require a fundamentally different substrate — typically a separate analytical-graph engine, not the OLTP-graph store. systems/netflix-graph-abstraction is the wiki's first canonical OLTP-graph instance; the analytical-graph axis is not yet covered.

When OLTP is still right

  • The workload is actually transactional: point lookups, per-row writes, strong transactional semantics, millisecond-class read latencies.
  • Scale is below the point where vertical growth hurts — Canva explicitly defends MySQL as correct for the first 2 years of their program.
  • Low-latency serving of already-computed aggregates (which is why Canva unloads Snowflake output back into RDS for reads; see patterns/warehouse-unload-bridge).

When OLAP is right

  • Workload is scan + aggregate over large datasets.
  • End-to-end recompute is acceptable (vs. maintaining live incremental state): OLAP's compute scaling makes rerunning a month-scale query in minutes feasible. See patterns/end-to-end-recompute.
  • You can tolerate higher per-query latency (seconds-to-minutes, not milliseconds) — OLAP is not a serving tier.

The hand-off problem

OLAP warehouses are not built for high-QPS low-latency reads, so aggregated results have to be unloaded into an OLTP-friendly store for serving. Canva does this with a scheduled worker → S3 → SQS → rate-limited RDS ingester; this bridge is a first-class piece of the architecture, not a footnote. See patterns/warehouse-unload-bridge.

Seen in

  • canonical wiki introduction of HTAP as the third workload archetype that tries (and mostly fails) to unify OLTP and OLAP in a single system. Savannah Longoria (PlanetScale, 2023-12-01) enumerates the three HTAP architecture classes (shared-everything, shared-nothing, hybrid)
  • four implementation styles (in-memory, columnar, separation-of- compute-and-storage, hybrid on-disk-plus-in-memory) + eight structural challenges (mixed-workload-complexity, performance- trade-offs, data-model mismatch, scalability, resource contention, maintenance-complexity, limited-analytical- processing, architecture-evolution). Canonicalises PlanetScale's explicit non-HTAP position statement verbatim: "PlanetScale does not claim to be an HTAP database, nor are we an OLAP database built for pure analytical workloads. Instead, PlanetScale offers the only managed Vitess solution and we are optimized for OLTP workloads." The recommended alternative is physical resource isolation + ETL offload via Airbyte / Fivetran / Stitch to a specialised warehouse. Composes with the Support-Notes post's per-session set workload='olap'; deliberate-friction in-envelope escape hatch and the 's CDC-as-OLTP-to-OLAP-bridge framing as the three-part canonical wiki statement of PlanetScale's OLTP-only architectural position.

  • canonical wiki statement that OLTP vs OLAP is a per-session vttablet runtime toggle on PlanetScale, not a separate engine or cluster. PlanetScale Support's Mike Stojan (2023-01-11) documents set workload='olap'; as the only supported per-session lift for PlanetScale's 20 s transaction / 900 s query hard timeouts, explicitly framed as a deliberate-friction escape hatch: "The workload cannot be changed globally, and it will reset to OLTP after you have closed the session." Canonical PlanetScale recommendation when a query genuinely needs OLAP: do not use the mode toggle; instead "support data integration engines such as Airbyte and Stitch, with which you can offload these processes to other platforms that are more specialized in this field." Reinforces the architectural framing from that the OLTP / OLAP split is the justification for CDC-fed downstream analytics rather than tuning the OLTP engine to handle both. The per-session OLAP mode is canonicalised here as an in-envelope last-resort, explicitly not recommended as a workload architecture.

  • sources/2024-04-29-canva-scaling-to-count-billions — Canva usage-counting MySQL → OLAP (Snowflake + DBT) migration; billions/month aggregated in minutes; >50% data reduction; pipeline latency

    1 day → <1 hour.

  • — Matt Lord frames the OLTP / OLAP split as the architectural motivation for CDC pipelines around sharded MySQL: "Vitess and MySQL are ideally suited for use as an Online Transaction Processing (OLTP) system … They are not optimized for Online Analytical Processing (OLAP) workloads and other use cases and needs that you will encounter as your product, company, and data needs grow." Canonical wiki framing of the split as "keep the OLTP system the source of truth, use CDC to maintain in-sync copies in systems tuned for analytics / reporting / integration" — the workload-archetype justification for CDC exists specifically because no single engine serves both workload types well. Extends this concept from Canva's MySQL-to-Snowflake instance to Vitess-as-OLTP-source feeding any downstream OLAP target via VStream.

Last updated · 542 distilled / 1,571 read