SYSTEM Cited by 1 source

Apache HBase¶

Apache HBase is an open-source, distributed, scalable wide-column NoSQL database modelled after Google Bigtable and built on top of HDFS (originally part of the Hadoop ecosystem). It was one of the dominant NoSQL choices of the 2010s and the substrate underneath many large-scale consumer internet backends, including Pinterest's.

Definition¶

HBase stores data in tables with a row-key + column-family + column- qualifier + timestamp model. It provides:

Strong consistency for reads + writes to a single row.
Automatic sharding via region-splitting.
WAL-based durability and replication (intra- and inter-cluster — see concepts/wal-replication).
JVM-based RegionServers managed by an HMaster; ZooKeeper for coordination.

Deployment shape at Pinterest¶

Pinterest ran one of the largest HBase deployments in the world:

~50 clusters
~9,000 AWS EC2 instances
> 6 PB of data (logical)
6 replicas per record (primary + standby, each three-way replicated — see patterns/primary-standby-wal-replication)
Substrate for a sprawling ecosystem: Zen (graph), UMS (wide-column), OpenTSDB (monitoring), Pinalytics (metrics), Omid/Sparrow (transactional), Ixia (indexed), etc.

(Source: sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest)

Why Pinterest deprecated it (end of 2021)¶

Five reasons — the canonical NoSQL-to-NewSQL deprecation framework:

High maintenance cost — Pinterest's HBase was ~5 years behind upstream; the previous upgrade (0.94 → 1.2) took "almost two years". Canonical tech-debt compounding instance.
Missing functionality — no native distributed transactions, no global secondary index, limited query, weaker-than-needed consistency knobs.
System complexity — advanced features were bolted on above HBase (Ixia for secondary indexing, Sparrow on Omid for distributed transactions). Each bolt-on was its own maintenance burden.
High infra cost — the 6-replica primary-standby shape was ~2× more storage than alternatives that achieve similar availability with 3 replicas — see concepts/replica-cost-tradeoff.
Waning community — upstream activity and industry adoption declining; talent pool shrinking.

Why it rose in the 2010s¶

Bigtable-inspired data model was a natural fit for web-scale sparse-row workloads that SQL struggled with.
Open-source, free, part of the Hadoop ecosystem engineering teams were already invested in.
Strong single-row consistency was enough for many use cases.

Why it waned¶

Workload fragmentation: OLAP wanted columnar engines, time-series wanted TSDBs, KV wanted RocksDB-based stores, transactional wanted NewSQL. HBase wasn't optimal for any of them.
ACID expectations grew; distributed transactions became table stakes.
Operational complexity (JVM, HDFS, ZooKeeper, HMaster, RegionServer) is large for the value delivered relative to modern NewSQL.

Seen in¶

sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest — canonical wiki write-up. Pinterest's retrospective on the deprecation decision and the workload-axis migration path to TiDB + KVStore + Goku + Druid / StarRocks.

systems/tidb — the NewSQL successor Pinterest selected.
systems/pinterest-zen / systems/pinterest-ixia / systems/pinterest-ums / systems/opentsdb — HBase-backed services at Pinterest.
systems/apache-phoenix-omid — the distributed-transaction bolt-on layered on HBase.
systems/pinterest-goku / systems/pinterest-kvstore — the workload-axis successors.
concepts/wal-replication — HBase's replication primitive.
concepts/primary-standby-failover — the DR shape HBase was deployed with.
concepts/replica-cost-tradeoff — the infra-cost axis that motivated deprecation.
patterns/primary-standby-wal-replication — the canonical deployment shape.
patterns/nosql-to-newsql-deprecation — the org-level move Pinterest executed.
patterns/workload-specific-datastore-migration — the decomposition that made the final deprecation tractable.
companies/pinterest — operated one of the largest HBase deployments globally before the 2021 deprecation decision.