Skip to content

SYSTEM Cited by 1 source

Lakebase Synced Tables

Lakebase Synced Tables are managed copies of systems/unity-catalog Delta tables materialised inside systems/lakebase for OLTP-style, low-latency point lookup access. They are the Delta → Postgres half of Lakebase's bidirectional governed-data path; the other half is Lakehouse Sync (Postgres → Delta).

Verbatim from the 2026-05-20 marketing-campaigns post:

"Databricks Synced Tables create a managed copy of our Unity Catalog data in Lakebase, making it available to applications that need OLTP-style, low-latency queries."

The synced table appears to the application as a normal Postgres table. The synchronisation pipeline is managed — the customer doesn't write or maintain sync code. Configuration is "just a few clicks" in the Lakebase UI.

Sync modes

Three modes are exposed, with mode selection driven by the delta proportion of the upstream Delta table per sync cycle rather than by cadence:

Mode Cadence Semantics When to use
Snapshot On-demand or scheduled Replaces the entire Lakebase table from a Delta snapshot When >10% of upstream data changes per cycle
Triggered On-demand Incremental upsert When <10% of upstream data changes per cycle
Continuous Streaming Continuous incremental upsert When latency is critical and changes are small/frequent

The 10% / 10× rule of thumb

The load-bearing operational disclosure from the 2026-05-20 post:

"When more than 10% of the data is updated, we recommend snapshot mode, which delivers 10x better performance than triggered mode."

This is canonicalised as the patterns/snapshot-sync-mode-for-batch-rebuild pattern. The counterintuitive part: the snapshot replaces the entire table on every cycle, but for high-delta workloads it's still 10× faster than incremental upsert because:

  • The incremental path pays a per-row diff/merge cost that scales linearly with delta size.
  • The snapshot path is a bulk-copy, which Lakebase's storage- compute-separation backend can stream efficiently from Pageserver without pre-row conflict resolution.

When >10% of rows change, the bulk snapshot wins. When <10% change, the per-row delta dominates and triggered mode wins.

The post does not quantify the continuous-mode tradeoff or the exact crossover behaviour around the 10% threshold.

Canonical workload: marketing-campaign customer segments

The 2026-05-20 post pitches the canonical use case explicitly:

"Customer segments are recomputed nightly in batch, replacing a significant portion of the dataset. When more than 10% of the data is updated, we recommend snapshot mode."

The shape:

  1. Analytical pipeline in the Lakehouse computes customer segments nightly (large-scan, complex SQL, lakehouse-native work).
  2. Synced Table in snapshot mode materialises the segment table into Lakebase Postgres.
  3. Marketing platform (e.g. SAP Engagement Cloud) queries Lakebase as a normal Postgres database — point lookups by campaign trigger.
  4. Compute scales 0 → 16 CU during campaign bursts, back to 0 during the lows.

This is a clean separation of analytical (segment computation) and operational (segment lookup) concerns where the Synced Table is the boundary artifact. Without it, the customer would either:

  • Run point lookups on the Lakehouse (slow, expensive, not optimised for high-concurrency point reads), or
  • Build and maintain their own Lakehouse → OLTP sync pipelines per segment table (operational burden the post explicitly cites as the problem this avoids).

Architectural relationship to Lakebase

Synced Tables are read-only from the application's perspective. The Lakebase compute can read them via standard Postgres queries (with indexes, query plans, the works) but writes go through the Synced Tables sync layer, not through direct table writes. This is consistent with the concepts/htap separation of concerns: analytical workloads own the upstream Delta table; operational workloads read from the synced copy.

Bidirectional companion: Lakehouse Sync handles the other direction — operational data written into Lakebase Postgres (e.g. notification signups from applications) is continuously synchronised back to Unity Catalog Delta tables for analytics.

Direction Mechanism Use case
Delta → Postgres Synced Tables (3 modes) Customer segments, lookup tables, AI features
Postgres → Delta Lakehouse Sync (continuous CDC) App-tier operational data (signups, events, state)

Both pipelines are managed; both are governed by Unity Catalog; both eliminate the hand-written-sync-pipeline operational tax.

Constraints (disclosed)

  • OLTP-shape only. "Databricks Lakebase is optimized for high-concurrency point lookups and short OLTP queries, not for large scans or classic OLAP." Synced Tables don't change this — large scans against synced tables should still happen on the upstream Delta table, not on the Lakebase copy.
  • Not real-time for snapshot mode. Snapshot mode is bulk refresh; the freshness floor is the snapshot cadence (e.g. nightly).

Seen in

Last updated · 542 distilled / 1,571 read