Skip to content

PATTERN Cited by 1 source

Workload-specific datastore migration

Shape

Rather than swap a general-purpose substrate for another general- purpose substrate in a single big-bang migration, decompose the workloads riding on it by access pattern and rehome each axis to a purpose-built store:

  • OLAP / analytics → columnar analytical engines (Druid, StarRocks, ClickHouse).
  • Time-series → time-series-specific datastores (Goku, Prometheus, M3, VictoriaMetrics).
  • Key-value → KV-optimised stores (KVStore on RocksDB + Rocksplicator, DynamoDB, Cassandra).
  • Transactional with rich query → NewSQL (TiDB, Spanner, CockroachDB).

What remains after the axis migrations is a much smaller residual — "general NoSQL with rich query + ACID + secondary index" — which a single NewSQL-style store can absorb.

Why it works

  • Each axis gets a better engine. General-purpose NoSQL stores are usually adequate for each workload but optimal for none. Druid for OLAP routinely outperforms HBase scans by 10×+ because it uses columnar storage and pre-aggregated rollups; Goku outperforms OpenTSDB because it's designed for time-series access patterns natively.
  • Migration risk is bounded per axis. Moving time-series off HBase doesn't affect the graph service; moving KV workloads doesn't affect OLAP dashboards. Each axis can be migrated, verified, and cut over independently, with its own rollback path.
  • Makes the big migration tractable. By the time the formal deprecation starts, the surface area has already shrunk. Pinterest had already migrated OLAP + time-series + KV before the end-of-2021 decision to fully deprecate HBase; the remaining workloads fit cleanly into the TiDB evaluation.
  • Reduces the complexity tax from bolt-on services. Many of the bolt-on services (Ixia, Omid/Sparrow) exist only because the substrate lacks OLAP / secondary-index / transaction support. A purpose-built store in each axis removes the need for the bolt-on.

Tradeoffs

  • Operates multiple stores simultaneously. For the duration of the migration — often years — the org runs more substrates, not fewer. Each has its own on-call, upgrade cadence, client library, backup policy.
  • Decomposition quality depends on workload separability. Works well when workloads are cleanly separable by access pattern; less well when a single service blends OLTP + OLAP + KV.
  • Doesn't help with the substrate-level 5-reason case. Workload migration alone doesn't retire the original substrate; it just shrinks the residual. The final retirement still needs the [[patterns/nosql-to-newsql-deprecation|5-reason deprecation framework]].

Seen in

  • sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest — canonical wiki instance. Before Pinterest's formal end-of-2021 decision to deprecate HBase, the company had already migrated three workload axes: OLAP to Druid + StarRocks (HBase "performed worse than state-of-the-art solutions for OLAP workloads"), time-series to Goku (HBase "was not able to keep up with the ever increasing time series data volume"), and KV to KVStore on RocksDB + Rocksplicator (HBase "was not as performant or infra efficient" for KV). The residual — "general NoSQL with scalability + queries + ACID" — drove the TiDB selection.
Last updated · 550 distilled / 1,221 read