Skip to content

SYSTEM Cited by 1 source

Apache HBase

Apache HBase is an open-source, distributed, scalable wide-column NoSQL database modelled after Google Bigtable and built on top of HDFS (originally part of the Hadoop ecosystem). It was one of the dominant NoSQL choices of the 2010s and the substrate underneath many large-scale consumer internet backends, including Pinterest's.

Definition

HBase stores data in tables with a row-key + column-family + column- qualifier + timestamp model. It provides:

  • Strong consistency for reads + writes to a single row.
  • Automatic sharding via region-splitting.
  • WAL-based durability and replication (intra- and inter-cluster — see concepts/wal-replication).
  • JVM-based RegionServers managed by an HMaster; ZooKeeper for coordination.

Deployment shape at Pinterest

Pinterest ran one of the largest HBase deployments in the world:

(Source: sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest)

Why Pinterest deprecated it (end of 2021)

Five reasons — the canonical NoSQL-to-NewSQL deprecation framework:

  1. High maintenance cost — Pinterest's HBase was ~5 years behind upstream; the previous upgrade (0.94 → 1.2) took "almost two years". Canonical tech-debt compounding instance.
  2. Missing functionality — no native distributed transactions, no global secondary index, limited query, weaker-than-needed consistency knobs.
  3. System complexity — advanced features were bolted on above HBase (Ixia for secondary indexing, Sparrow on Omid for distributed transactions). Each bolt-on was its own maintenance burden.
  4. High infra cost — the 6-replica primary-standby shape was ~2× more storage than alternatives that achieve similar availability with 3 replicas — see concepts/replica-cost-tradeoff.
  5. Waning community — upstream activity and industry adoption declining; talent pool shrinking.

Why it rose in the 2010s

  • Bigtable-inspired data model was a natural fit for web-scale sparse-row workloads that SQL struggled with.
  • Open-source, free, part of the Hadoop ecosystem engineering teams were already invested in.
  • Strong single-row consistency was enough for many use cases.

Why it waned

  • Workload fragmentation: OLAP wanted columnar engines, time-series wanted TSDBs, KV wanted RocksDB-based stores, transactional wanted NewSQL. HBase wasn't optimal for any of them.
  • ACID expectations grew; distributed transactions became table stakes.
  • Operational complexity (JVM, HDFS, ZooKeeper, HMaster, RegionServer) is large for the value delivered relative to modern NewSQL.

Seen in

Last updated · 550 distilled / 1,221 read