Skip to content

SYSTEM Cited by 1 source

Lakehouse Sync

Lakehouse Sync is Databricks' managed CDC pipeline that continuously replicates writes from Lakebase Postgres into Unity Catalog Delta tables. It is the Postgres → Delta half of Lakebase's bidirectional governed-data path; the other half is Synced Tables (Delta → Postgres).

Verbatim from the 2026-05-20 marketing-campaigns post:

"Any data written to Lakebase can then be synchronized to the Lakehouse for analytics via Lakehouse Sync — a native, continuous CDC-based pipeline from Lakebase Postgres to Unity Catalog Delta tables that makes operational data available for richer analytics and AI."

Architectural shape

Property Value
Direction Lakebase Postgres → Unity Catalog Delta
Mechanism Continuous CDC (concepts/change-data-capture)
Latency Continuous (not batch)
Management Native / managed (no customer pipeline code)
Governance Lands in Unity Catalog managed Delta tables
Use case Operational data → analytics / AI

The post does not disclose internal mechanics — whether it uses Postgres logical decoding, WAL streaming, Pageserver-side reads, or some Lakebase-specific extraction path. The "native, continuous CDC-based" framing is consistent with logical-decoding-on-the-Postgres-side feeding a streaming ingest into Delta on the Lakehouse side, but this is not confirmed.

Canonical use case: app-tier operational data → analytics

The 2026-05-20 post's example: marketing-campaign signup notifications written into Lakebase by application code:

"customers might sign up to receive notifications about product restocks or new arrivals in a specific category or brand. Applications can use Lakebase as a standard Postgres database to store this notification data, making it available to Engagement Cloud for campaign targeting. Any data written to Lakebase can then be synchronized to the Lakehouse for analytics via Lakehouse Sync."

The dataflow:

  1. Application writes signup event to Lakebase Postgres (low-latency OLTP write).
  2. SAP Engagement Cloud reads that signup row for campaign targeting (low-latency OLTP read on the same Lakebase table — no sync delay since the data lives in Postgres natively).
  3. Lakehouse Sync continuously CDCs the signup events into a UC-managed Delta table.
  4. Analytics workloads (segment computation, AI training, BI) read from the Delta table at lakehouse scale.

The architectural payoff: a single write into Lakebase serves both operational reads (low-latency, point lookup) and analytical reads (large-scan, columnar, lakehouse-priced) without the application needing to dual-write or the data team needing to maintain a Postgres → Delta CDC pipeline by hand.

Bidirectional companion shape

Lakebase + Lakehouse Sync + Synced Tables form a closed loop:

Lakehouse (Unity Catalog Delta)
    │  ─────[Synced Tables]─────▶  Lakebase (Postgres OLTP)
    │  ◀─────[Lakehouse Sync]─────  
    └── Analytics, AI, BI
                                    └── Application writes & reads
Direction Mechanism Modes Use case
Delta → Postgres Synced Tables snapshot / triggered / continuous Customer segments, AI features, lookup tables
Postgres → Delta Lakehouse Sync continuous (CDC) App-tier operational data → analytics & AI

Both pipelines are managed by Databricks; both are governed by Unity Catalog; both eliminate the hand-written-sync-pipeline operational tax.

Architectural significance

Lakehouse Sync is the load-bearing piece that makes the concepts/htap split manageable. Without it, app-tier operational data either:

  • Stays trapped in OLTP and is never available for analytics (the most common shape — and the reason analytics teams build parallel ETL pipelines), or
  • Is dual-written by the application to both stores (which adds complexity and consistency risk), or
  • Is replicated by hand-maintained CDC pipelines (which the data team owns and operates).

By making CDC native and continuous, Lakehouse Sync makes "OLTP-write data is automatically available for analytics" the default behaviour rather than the exception. This shifts the operational burden from "build and maintain a pipeline per table" to "configure once and let the platform manage it."

Constraints (disclosed)

  • Direction is Postgres → Delta only. Bidirectional sync is achieved by combining Lakehouse Sync with Synced Tables in opposite directions. There is no single bidirectional-merge mode.
  • No conflict resolution semantics disclosed. Since the post describes only Postgres-as-source-of-truth → Delta flow, conflict resolution between concurrent writes on both sides is not in scope.

Seen in

Last updated · 542 distilled / 1,571 read