CONCEPT Cited by 1 source

Distributed transactions¶

Distributed transactions are atomic operations that span multiple database rows, shards, or nodes — committing all parts or none, across machine boundaries. They are the most-requested missing feature in NoSQL stores and one of the load-bearing reasons cited in storage-engine deprecations.

Definition¶

A distributed transaction provides ACID semantics — atomicity, consistency, isolation, durability — across multiple participants that do not share memory or a single write- ahead log. Typical implementations:

Two-phase commit (2PC) — a coordinator orchestrates a prepare phase followed by a commit phase; participants promise durability in prepare, then commit atomically on coordinator's go.
Percolator / Omid — snapshot-isolation-based; uses a central timestamp oracle and per-row primary/secondary lock cells in the data layer. Apache Omid implements this on HBase.
Calvin / deterministic — pre-order all transactions globally and execute deterministically on each replica.
TrueTime / Spanner — bounded clock uncertainty enables external consistency without a central oracle.

Why missing distributed transactions is load-bearing¶

NoSQL stores that provide only per-row or per-partition atomicity force the application layer to implement multi-row consistency by hand. The canonical failure mode:

"The lack of distributed transactions in HBase led to a number of bugs and incidents of Zen, our in-house graph service, because partially failed updates could leave a graph in an inconsistent state. Debugging such problems was usually difficult and time- consuming, causing frustration for service owners and their customers." (Source: sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest)

Graph updates — "add edge A → B" — typically touch both vertex rows and an edge row. Without distributed transactions, a crash or partial failure mid-update can leave the graph in an inconsistent state (edge exists but target vertex doesn't, or vice versa). Application code can try to detect and repair, but that becomes a parallel consistency-maintenance codebase — the "consistency-by-application" anti-pattern.

Why it's hard¶

Network partitions. Participants can fail, disappear, or be slow; the coordinator has to handle every combination.
Clock skew. Snapshot-isolation schemes need either a central oracle (bottleneck) or bounded clock uncertainty (infra-level).
Performance. Cross-shard coordination costs — typical 2PC adds ≥ 1 round-trip to every transaction; Percolator-style adds lock contention under hot keys.
Failure recovery. Coordinator failure mid-commit is the classic 2PC pain point.

Canonical bolt-on: Omid on HBase¶

Pinterest's Sparrow service implemented distributed transactions over HBase by layering Apache Phoenix + Omid on top. This is the canonical example of the complexity-tax axis of patterns/nosql-to-newsql-deprecation — the substrate lacks the feature, so the org builds a bolt-on service to provide it, and that bolt-on is its own maintenance burden.

Seen in¶

sources/2024-05-14-pinterest-hbase-deprecation-at-pinterest — canonical wiki example of the real-world cost of missing distributed transactions: graph-service inconsistency incidents, difficult debugging, bolt-on Sparrow-on-Omid architecture to paper over the gap. Named as axis 2 of the five-reason deprecation framework.

concepts/acid-properties — the consistency guarantees distributed transactions try to extend across machine boundaries.
concepts/atomic-distributed-transaction — the atomicity dimension specifically.
concepts/database-transaction — the single-node baseline.
systems/apache-phoenix-omid — canonical distributed-transaction bolt-on for HBase.
systems/tidb — NewSQL store that provides distributed transactions natively.
systems/pinterest-zen — graph service that suffered from the lack of distributed transactions in HBase.
patterns/nosql-to-newsql-deprecation — the deprecation move motivated in part by missing distributed transactions.