CONCEPT Cited by 1 source
Distributed transactions¶
Distributed transactions are atomic operations that span multiple database rows, shards, or nodes — committing all parts or none, across machine boundaries. They are the most-requested missing feature in NoSQL stores and one of the load-bearing reasons cited in storage-engine deprecations.
Definition¶
A distributed transaction provides ACID semantics — atomicity, consistency, isolation, durability — across multiple participants that do not share memory or a single write- ahead log. Typical implementations:
- Two-phase commit (2PC) — a coordinator orchestrates a prepare phase followed by a commit phase; participants promise durability in prepare, then commit atomically on coordinator's go.
- Percolator / Omid — snapshot-isolation-based; uses a central timestamp oracle and per-row primary/secondary lock cells in the data layer. Apache Omid implements this on HBase.
- Calvin / deterministic — pre-order all transactions globally and execute deterministically on each replica.
- TrueTime / Spanner — bounded clock uncertainty enables external consistency without a central oracle.
Why missing distributed transactions is load-bearing¶
NoSQL stores that provide only per-row or per-partition atomicity force the application layer to implement multi-row consistency by hand. The canonical failure mode:
"The lack of distributed transactions in HBase led to a number of bugs and incidents of Zen, our in-house graph service, because partially failed updates could leave a graph in an inconsistent state. Debugging such problems was usually difficult and time- consuming, causing frustration for service owners and their customers." (Source: sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest)
Graph updates — "add edge A → B" — typically touch both vertex rows and an edge row. Without distributed transactions, a crash or partial failure mid-update can leave the graph in an inconsistent state (edge exists but target vertex doesn't, or vice versa). Application code can try to detect and repair, but that becomes a parallel consistency-maintenance codebase — the "consistency-by-application" anti-pattern.
Why it's hard¶
- Network partitions. Participants can fail, disappear, or be slow; the coordinator has to handle every combination.
- Clock skew. Snapshot-isolation schemes need either a central oracle (bottleneck) or bounded clock uncertainty (infra-level).
- Performance. Cross-shard coordination costs — typical 2PC adds ≥ 1 round-trip to every transaction; Percolator-style adds lock contention under hot keys.
- Failure recovery. Coordinator failure mid-commit is the classic 2PC pain point.
Canonical bolt-on: Omid on HBase¶
Pinterest's Sparrow service implemented distributed transactions over HBase by layering Apache Phoenix + Omid on top. This is the canonical example of the complexity-tax axis of patterns/nosql-to-newsql-deprecation — the substrate lacks the feature, so the org builds a bolt-on service to provide it, and that bolt-on is its own maintenance burden.
Seen in¶
- sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest — canonical wiki example of the real-world cost of missing distributed transactions: graph-service inconsistency incidents, difficult debugging, bolt-on Sparrow-on-Omid architecture to paper over the gap. Named as axis 2 of the five-reason deprecation framework.
Related¶
- concepts/acid-properties — the consistency guarantees distributed transactions try to extend across machine boundaries.
- concepts/atomic-distributed-transaction — the atomicity dimension specifically.
- concepts/database-transaction — the single-node baseline.
- systems/apache-phoenix-omid — canonical distributed-transaction bolt-on for HBase.
- systems/tidb — NewSQL store that provides distributed transactions natively.
- systems/pinterest-zen — graph service that suffered from the lack of distributed transactions in HBase.
- patterns/nosql-to-newsql-deprecation — the deprecation move motivated in part by missing distributed transactions.