Skip to content

PLANETSCALE 2023-12-01

Read original ↗

PlanetScale — What is HTAP?

Summary

Savannah Longoria's 2023-12-01 pedagogy post defines HTAP (Hybrid Transactional / Analytical Processing) as the database marketing category promising to serve both OLTP and OLAP workloads from a single system, then enumerates three HTAP architecture classes ( shared-everything, shared-nothing, hybrid) and four HTAP implementation styles (in-memory, columnar, separation-of-compute-and-storage, hybrid on-disk-plus-in-memory), then catalogues eight structural challenges that hinder HTAP at scale. Canonical PlanetScale position statement verbatim: "PlanetScale does not claim to be an HTAP database, nor are we an OLAP database built for pure analytical workloads. Instead, PlanetScale offers the only managed Vitess solution and we are optimized for OLTP workloads." The load-bearing architectural recommendation: "Physical resource isolation is an effective way to guarantee the performance of transactional queries" — i.e. do not run transactional and analytical workloads on the same physical resources; for OLAP use ETL/ELT to offload to a specialised warehouse via integration engines like Airbyte, Fivetran, or Stitch.

Key takeaways

  • OLTP vs OLAP is a workload-shape dichotomy, not a product dichotomy. OLTP = high-volume small transactional operations (individual record insert/delete/update + point queries that benefit from indexes); OLAP = complex queries analysing large amounts of data (batch updates + table scans). The two workload shapes "have different characteristics" that HTAP tries (and mostly fails) to unify. (Source: this post)

  • HTAP architecture classes are three. (1) shared-everything — all data stored in a single shared storage system; simplest to implement + consistency-preserving, but scale-limited. (2) shared-nothing — each node stores its own data; horizontally scalable, but harder to implement + manage. (3) hybrid — transactional data in shared-everything + analytical data in shared-nothing; hard to ensure consistency under concurrent transactional/analytical ops; scale-limited. (Source: this post)

  • HTAP implementation styles are four. (a) In-memory HTAP — each node stores data in memory; horizontally scalable but consistency-hard and expensive (memory cost). (b) Columnar HTAP — columnar storage for analytical queries; good analytical performance, slower for transactional queries ([[concepts/ columnar-storage-format|columnar]] is row-read-hostile). (c) Separation of storage and computedecouples storage from processing. (d) Hybrid on-disk-plus-in-memory — transactional data on disk, analytical in memory. (Source: this post)

  • Eight structural HTAP challenges. Verbatim taxonomy: (1) mixed workload complexity — database juggles high-speed tx + resource-intensive analytics; "inherent conflict in requirements"; (2) performance trade-offs — optimizing one workload penalises the other via shared resources; (3) data model mismatch — OLTP updates individual records / maintains integrity + consistency, OLAP does complex aggregations + scans; fitting both into the same data model leads to "suboptimal design compromises"; (4) scalability challenges — horizontal scale of both components simultaneously is hard; (5) resource contention — CPU, memory, and I/O bandwidth contention leads to "resource bottlenecks, unpredictable performance fluctuations, and overall system instability"; (6) maintenance + administration complexity — more intricate ops + tuning burden; (7) limitation in analytical processing — specialised warehouses "can employ more sophisticated optimization techniques for complex analytical operations, offering superior performance and richer insights"; (8) evolution of data processing architectures — modern distributed / microservices / serverless architectures "potentially making it challenging to fit a hybrid database into the larger application ecosystem". (Source: this post)

  • PlanetScale's explicit non-HTAP position. Verbatim: "PlanetScale does not claim to be an HTAP database, nor are we an OLAP database built for pure analytical workloads. Instead, PlanetScale offers the only managed Vitess solution and we are optimized for OLTP workloads." Canonical wiki framing of PlanetScale's workload archetype positioning, reinforcing the building-data-pipelines post's "Vitess and MySQL are ideally suited for use as an Online Transaction Processing (OLTP) system" framing and the Support notes' per-session set workload='olap'; deliberate- friction escape-hatch framing. (Source: this post)

  • Physical resource isolation is the canonical fix. "Physical resource isolation is an effective way to guarantee the performance of transactional queries. Analytical queries often consume high levels of resources such as CPU, memory, and I/O bandwidth. If these queries run together with transactional queries, the latter can be seriously delayed." Names the mechanism underlying the workload-segregated- clusters pattern: don't share physical resources across workload archetypes. (Source: this post)

  • Recommended OLAP offload mechanism: ETL/ELT to a specialised warehouse via integration engines. "For large ETL workloads, we support and recommend data integration engines such as Airbyte, Fivetran, and Stitch, with which you can offload these processes to other platforms that are more specialized in OLAP workloads." Canonical PlanetScale recommendation for OLAP: do not use HTAP, do not tune OLTP for OLAP, offload via CDC-style integration engine to a dedicated warehouse. (Source: this post)

  • Workload-specific separation when workloads are truly distinct. "If you have a complex application with distinct transactional and analytical workloads that can be separated, then it may be more appropriate to use separate databases for each workload. This approach allows each database to be optimized for its specific workload and can provide better performance and scalability." Frames separate-databases-per-workload as the canonical architectural choice when workloads are separable. (Source: this post)

Systems / concepts / patterns extracted

Operational numbers

None disclosed — this is a pedagogy post, not a benchmark or retrospective. No latency, throughput, cost, or resource-saturation numbers. No production incidents cited.

Caveats

  • Marketing-voice pedagogy. Opening framing ("Through extensive marketing efforts, HTAP has been positioned as a promising new computing paradigm…") is adversarial toward HTAP vendors; the eight-challenges list is the strongest part of the post but does not cite specific HTAP systems by name (no TiDB, SingleStore, Snowflake Unistore, SAP HANA, MemSQL, etc.) or their published architecture papers.

  • No specific HTAP product engagement. The four implementation styles are named abstractly but no commercial implementation is called out for each — reader has to map styles to products independently.

  • No quantitative trade-off disclosure. Challenges (5) resource contention and (2) performance trade-offs are named but not numerically quantified — e.g. no p99 degradation numbers for mixed workloads, no CPU / memory / IOPS saturation thresholds that trigger contention, no measurement-based comparison to physically-isolated alternatives.

  • Author altitude. Savannah Longoria is PlanetScale's marketing-team author; not a canonical-database-internals byline like Ben Dicken or Shlomi Noach. The post is pedagogy-canonical on the OLTP/OLAP/HTAP axis + the PlanetScale-is-not-HTAP positional statement, but is not an engineering-voice deep-dive.

  • Date of fetch vs publication. Post was originally published 2023-12-01 per byline; re-fetched 2026-04-21 by the wiki's scrape pipeline. No detected edits between publication and fetch.

  • "Separation of storage and compute" is named as one HTAP implementation style but the post doesn't name specific systems that use this approach in an HTAP context (Snowflake, Aurora, CockroachDB, TiDB all use compute-storage separation but with different consistency and workload-routing semantics).

  • Airbyte, Fivetran, Stitch name-dropped equivalently. The post treats the three as interchangeable integration-engine options without engaging their differences (open-source vs proprietary, CDC capture mechanisms, latency characteristics, schema-drift handling).

Source

Last updated · 470 distilled / 1,213 read