Skip to content

DATABRICKS 2026-06-30

Read original ↗

From monolith to Lakebase to LTAP: rethinking the database from storage up

Summary

Databricks CTO Matei Zaharia traces the evolution from monolithic Postgres (WAL + data files on one machine) to the Lakebase disaggregated architecture (WAL externalized to SafeKeeper via Paxos, data files externalized to PageServer backed by object storage), and then introduces LTAP (Lake Transactional/Analytical Processing) — a storage-layer unification that transcodes Postgres row data into columnar Parquet/Iceberg format at the PageServer layer, enabling both Postgres and the Lakehouse analytical engine to read from a single governed copy of data with no CDC pipeline.

Key Takeaways

  1. Monolithic DB root problem: In traditional Postgres, durability depends on a single machine's disk flush. The WAL and data files co-located on one node are the root cause of scaling pain, HA complexity, and analytics contention.

  2. SafeKeeper externalizes WAL durability: Commits are made durable via Paxos-based quorum replication to SafeKeeper nodes rather than relying on local disk flush. This eliminates data loss from misconfiguration or single-node failure, and does not add latency vs. synchronous replication.

  3. PageServer externalizes data files: Data files are materialized into cloud object storage. The PageServer applies WAL deltas to reconstruct pages on demand, using multi-layered caching (buffer pool → local disk → PageServer → object store) so read latency is indistinguishable from a monolith for most operations.

  4. Stateless compute unlocks serverless: Because Postgres compute holds no durable state, instances can scale to zero, scale up instantly, and fail over without promoting a physical clone. Storage capacity is effectively unlimited via object storage.

  5. Instant database branching: Like git branches for code, a database branch or clone is a metadata-only operation (not a physical copy), enabling sub-second cloning of large production databases for experiments, migrations, or PITR.

  6. 5× write throughput, 2× lower read latency: Claimed performance improvements over monolithic Postgres due to SafeKeeper + PageServer architecture eliminating synchronous-replication overhead and local-disk contention.

  7. LTAP: storage-layer unification, not engine unification: Rather than building a single engine good at both OLTP and analytics (HTAP approach), LTAP keeps Postgres for transactions and Lakehouse engines for analytics but unifies them at the storage layer — one durable copy in open columnar formats (Delta/Iceberg over Parquet).

  8. Row-to-columnar transcoding in PageServer: As the PageServer materializes pages to object storage, spare CPU transcodes row data into Parquet columnar layout. Postgres type system is preserved losslessly (overflow field for exotic types). MVCC row versions are retained for Postgres but invisible to Iceberg/Delta readers.

  9. Freshness without impact: Analytical queries ask Postgres for the current LSN (cheap metadata op), read bulk data from object storage, then fetch only the unmaterialized delta from PageServer. Postgres serves zero analytical I/O other than returning a single number.

  10. Every table automatically available for analytics: Unlike CDC/mirroring approaches that require per-table opt-in and have replication lag, LTAP makes all tables queryable analytically by construction — no pipeline to build or monitor.

  11. HTAP critique: HTAP systems historically fail due to incomplete feature sets, lack of ecosystem, and no performance isolation. LTAP sidesteps all three by using mature existing engines (Postgres + Spark/Lakehouse) and isolating workloads at the compute level while sharing storage.

Architectural Diagram (conceptual)

┌─────────────────────────────┐
│  Postgres Compute (stateless)│ ← OLTP queries
└──────────┬──────────────────┘
           │ WAL stream
┌──────────────────┐    ┌──────────────────────────┐
│   SafeKeeper     │───▶│      PageServer           │
│ (Paxos quorum)   │    │ (materializes → obj store) │
└──────────────────┘    └───────────┬──────────────┘
                                    │ columnar transcoding
                        ┌──────────────────────────┐
                        │  Object Storage (S3/ADLS) │
                        │  Delta / Iceberg / Parquet │
                        └───────────┬──────────────┘
                        ┌──────────────────────────┐
                        │  Lakehouse Analytics      │ ← OLAP queries
                        │  (Spark, Photon, etc.)    │
                        └──────────────────────────┘

Operational Numbers

  • 5× higher write throughput vs monolithic Postgres (from SafeKeeper + PageServer architecture)
  • 2× lower read latency (from PageServer + local cache tiering)
  • 10× columnar compression ratio vs row format (Parquet advantage)

  • Sub-second database branch/clone (metadata-only operation)
  • LSN-consistent analytical reads with near-zero staleness

Caveats

  • LTAP is announced but "rolling out in coming months" — not yet GA at time of publication.
  • Performance claims (5×, 2×) reference a separate benchmark blog post; workload-specific results may vary.
  • The row-to-columnar transcoding approach is novel and production maturity is unproven at scale.
  • Very small tables are not converted to columnar form (optimization trade-off).

Source

Last updated · 564 distilled / 1,671 read