SYSTEM Cited by 1 source
Lakehouse Sync¶
Lakehouse Sync is Databricks' managed CDC pipeline that continuously replicates writes from Lakebase Postgres into Unity Catalog Delta tables. It is the Postgres → Delta half of Lakebase's bidirectional governed-data path; the other half is Synced Tables (Delta → Postgres).
Verbatim from the 2026-05-20 marketing-campaigns post:
"Any data written to Lakebase can then be synchronized to the Lakehouse for analytics via Lakehouse Sync — a native, continuous CDC-based pipeline from Lakebase Postgres to Unity Catalog Delta tables that makes operational data available for richer analytics and AI."
Architectural shape¶
| Property | Value |
|---|---|
| Direction | Lakebase Postgres → Unity Catalog Delta |
| Mechanism | Continuous CDC (concepts/change-data-capture) |
| Latency | Continuous (not batch) |
| Management | Native / managed (no customer pipeline code) |
| Governance | Lands in Unity Catalog managed Delta tables |
| Use case | Operational data → analytics / AI |
The post does not disclose internal mechanics — whether it uses Postgres logical decoding, WAL streaming, Pageserver-side reads, or some Lakebase-specific extraction path. The "native, continuous CDC-based" framing is consistent with logical-decoding-on-the-Postgres-side feeding a streaming ingest into Delta on the Lakehouse side, but this is not confirmed.
Canonical use case: app-tier operational data → analytics¶
The 2026-05-20 post's example: marketing-campaign signup notifications written into Lakebase by application code:
"customers might sign up to receive notifications about product restocks or new arrivals in a specific category or brand. Applications can use Lakebase as a standard Postgres database to store this notification data, making it available to Engagement Cloud for campaign targeting. Any data written to Lakebase can then be synchronized to the Lakehouse for analytics via Lakehouse Sync."
The dataflow:
- Application writes signup event to Lakebase Postgres (low-latency OLTP write).
- SAP Engagement Cloud reads that signup row for campaign targeting (low-latency OLTP read on the same Lakebase table — no sync delay since the data lives in Postgres natively).
- Lakehouse Sync continuously CDCs the signup events into a UC-managed Delta table.
- Analytics workloads (segment computation, AI training, BI) read from the Delta table at lakehouse scale.
The architectural payoff: a single write into Lakebase serves both operational reads (low-latency, point lookup) and analytical reads (large-scan, columnar, lakehouse-priced) without the application needing to dual-write or the data team needing to maintain a Postgres → Delta CDC pipeline by hand.
Bidirectional companion shape¶
Lakebase + Lakehouse Sync + Synced Tables form a closed loop:
Lakehouse (Unity Catalog Delta)
│ ─────[Synced Tables]─────▶ Lakebase (Postgres OLTP)
│ ◀─────[Lakehouse Sync]─────
└── Analytics, AI, BI
└── Application writes & reads
| Direction | Mechanism | Modes | Use case |
|---|---|---|---|
| Delta → Postgres | Synced Tables | snapshot / triggered / continuous | Customer segments, AI features, lookup tables |
| Postgres → Delta | Lakehouse Sync | continuous (CDC) | App-tier operational data → analytics & AI |
Both pipelines are managed by Databricks; both are governed by Unity Catalog; both eliminate the hand-written-sync-pipeline operational tax.
Architectural significance¶
Lakehouse Sync is the load-bearing piece that makes the concepts/htap split manageable. Without it, app-tier operational data either:
- Stays trapped in OLTP and is never available for analytics (the most common shape — and the reason analytics teams build parallel ETL pipelines), or
- Is dual-written by the application to both stores (which adds complexity and consistency risk), or
- Is replicated by hand-maintained CDC pipelines (which the data team owns and operates).
By making CDC native and continuous, Lakehouse Sync makes "OLTP-write data is automatically available for analytics" the default behaviour rather than the exception. This shifts the operational burden from "build and maintain a pipeline per table" to "configure once and let the platform manage it."
Constraints (disclosed)¶
- Direction is Postgres → Delta only. Bidirectional sync is achieved by combining Lakehouse Sync with Synced Tables in opposite directions. There is no single bidirectional-merge mode.
- No conflict resolution semantics disclosed. Since the post describes only Postgres-as-source-of-truth → Delta flow, conflict resolution between concurrent writes on both sides is not in scope.
Seen in¶
- Marketing-campaign signup data at Deichmann / SAP Engagement Cloud (2026-05-20) — the Lakehouse Sync use case from the marketing-campaigns post. App-tier signup events flow Lakebase → Delta for campaign- effectiveness analytics. (Source: sources/2026-05-20-databricks-marketing-campaigns-with-lakebase)
Related¶
- systems/lakebase — source side of the pipeline.
- systems/unity-catalog — sink-side governance and catalog.
- systems/delta-lake — sink-side storage format.
- systems/lakebase-synced-tables — bidirectional companion (the Delta → Postgres direction).
- concepts/change-data-capture — generalisation; Lakehouse Sync is a managed-CDC instance.
- concepts/htap — the architectural shape Lakehouse Sync enables.