Databricks — Unlock seamless and cost-effective marketing campaigns with Lakebase¶
Summary¶
A Databricks Blog post (2026-05-20, Tier 3) that frames the
canonical bursty marketing-campaign workload as the canonical
fit for Lakebase's serverless OLTP economics,
walking through how Deichmann (a European footwear retailer)
integrated Lakebase as the OLTP backend for SAP Engagement Cloud,
their omnichannel marketing platform. The post is half tutorial
and half architecture pitch, but it surfaces three new wiki
canonicalisations: Lakebase
Synced Tables with three sync modes (snapshot / triggered /
continuous) and a quantified >10% delta rule of thumb for
snapshot mode;
Lakehouse Sync as a CDC-based
Postgres → Delta pipeline making operational data available for
analytics and AI; and Local
File Cache (LFC) with PREFETCH and FILECACHE as Lakebase-
specific Postgres-engine metrics for diagnosing query
performance under storage-compute separation.
Key takeaways¶
-
Marketing campaigns are the canonical bursty OLTP workload. Customer segments are recomputed nightly, then read in short, intense bursts when marketing tools fire campaigns. Verbatim: "customer segments used for personalized campaigns are often stored in an OLTP database from which marketing tools read them. When marketing campaigns are launched, there is a spike in database requests, but otherwise, database utilization is low." Traditional OLTP sized for peak-burst pays for that capacity 99% of the time (instance of concepts/bursty-query-pattern).
-
Storage-compute separation breaks the "more attributes → more compute" coupling. Verbatim: "By separating storage from compute, data can be stored cheaply in object stores without scaling compute linearly. It means the number and diversity of customer attributes can increase significantly without requiring additional compute resources." This reframes concepts/compute-storage-separation from a cost-optimisation argument into a modelling-freedom argument: marketing teams can grow segment dimensionality without an OLTP capacity-planning step.
-
Lakebase Autoscaling: scale-to-0 + medium-cap (16 CU / ~32 GB RAM). The post discloses a concrete production sizing: "for compute, we scale to 0 for the extended lows, eliminating compute costs for these periods, and set a medium capacity of 16 CU (~32 GB RAM) as the maximum for the spikes." The architectural justification: "Even if the chosen memory range is relatively large, Lakebase autoscaling speed and reactivity eliminate the risk of resource underutilization, which lowers TCO." The sub-second scale-down disclosed earlier ("scales down when idle in less than a second") makes generous max-cap sizing safe — the underutilisation risk is bounded by autoscaling reactivity, not by the gap between min and max. Canonicalises an instance of concepts/scale-to-zero applied to bursty OLTP.
-
Three sync modes, with a quantified rule of thumb for choosing snapshot. Synced Tables support three modes: snapshot, triggered, and continuous. Verbatim: "When more than 10% of the data is updated, we recommend snapshot mode, which delivers 10x better performance than triggered mode." The decision variable is the delta proportion of the upstream Delta table per sync cycle, not the cadence — nightly batch recomputes that replace most rows fall on the snapshot side even if the cadence is daily. Canonicalised as the patterns/snapshot-sync-mode-for-batch-rebuild pattern.
-
Lakehouse Sync = native Postgres → Delta CDC pipeline. For data written into Lakebase by applications (e.g. product-restock notification signups), the post introduces Lakehouse Sync — verbatim: "a native, continuous CDC-based pipeline from Lakebase Postgres to Unity Catalog Delta tables that makes operational data available for richer analytics and AI." Together with Synced Tables (Delta → Postgres), this completes a bidirectional governed-data path between the analytical and operational tiers, both managed by the lakehouse instead of by hand-maintained ETL pipelines (instance of concepts/change-data-capture applied to operational-to-analytical data flow).
-
Local File Cache (LFC) is Lakebase's compute-side cache layer. The post discloses two Lakebase-specific Postgres query-statistics metrics: "PREFETCH and FILECACHE are specific to Lakebase and show, respectively, how many prefetch requests were issued/hit/wasted and what were the hits/misses against the Local File Cache (LFC)." Canonicalised as concepts/lakebase-local-file-cache — the compute-VM-local cache of Pageserver pages that softens the storage-compute-separation latency penalty for hot working sets.
-
OAuth hourly token rotation is incompatible with non-Databricks-aware partner systems. Lakebase supports both OAuth roles (Databricks identities) and native Postgres password roles. SAP Engagement Cloud forces the latter because "Engagement Cloud can't handle the hourly token rotation happening for OAuth roles." Canonicalised as patterns/native-postgres-roles-for-non-databricks-aware-partners — the architectural escape hatch for integrating non-Databricks-native consumers, with the explicit recommendation that "we recommend rotating passwords by generating new ones on a regular schedule" (i.e. the security tradeoff is shifted to operational discipline).
-
Postgres tuning is Postgres tuning. The post explicitly reuses the standard Postgres optimisation playbook — indexing on filter columns,
pg_stat_statementsfor slow-query identification,work_memtuning (suggesting bumping to 256 MB on larger compute), andautovacuum_vacuum_scale_factortuning for high-churn tables. Lakebase exposes its own SQL console UI for these knobs but the underlying mechanics are unchanged. Confirms that the storage-compute-separation architecture preserves the Postgres operational surface area for tuning. -
Snapshot mode runs as a managed pipeline. Verbatim: "a managed pipeline is created, and the data is synchronized." The customer doesn't write or maintain sync code; clicks-not-pipelines is the load-bearing simplicity claim. The architectural payoff: "Making new customer segments available to Engagement Cloud now takes just a few clicks, accelerating time to market and reducing operational burden."
-
Lakebase OLTP shape (workload constraint). The post repeats the constraint that defines what Lakebase is for: "Databricks Lakebase is optimized for high-concurrency point lookups and short OLTP queries, not for large scans or classic OLAP." This is the architectural fence that makes the Synced Tables + Lakehouse Sync split correct — analytics queries stay on the lakehouse, operational queries stay on Lakebase, and the bidirectional sync layer is what makes the boundary manageable.
Architectural elements named¶
Sync modes (Lakebase Synced Tables)¶
| Mode | Cadence | Semantics | When to use |
|---|---|---|---|
| Snapshot | On-demand or scheduled | Replaces the entire Lakebase table from a Delta snapshot | When >10% of data changes per cycle (10× perf vs triggered) |
| Triggered | On-demand | Incremental upsert | When <10% of data changes per cycle |
| Continuous | Streaming | Continuous incremental upsert | When latency is critical and changes are small/frequent |
Lakebase-specific query metrics¶
PREFETCH— prefetch requests issued / hit / wasted. Disclosed only in this post.FILECACHE— hits/misses against the Local File Cache (LFC). Disclosed only in this post.- Standard Postgres:
pg_stat_statements,pg_stat_user_tables(forautovacuumbloat tracking) — both available.
Postgres tuning surface (suggested)¶
- Indexes: standard Postgres B-tree indexes on filter
columns. Created via standard
CREATE INDEXin the Lakebase SQL console. work_mem: bump to 256 MB on larger compute (verbatim recommendation).autovacuum_vacuum_scale_factor: tune lower for high-churn tables; monitor bloat withpg_stat_user_tables.
Authentication options¶
- OAuth roles — for Databricks identities. Hourly token rotation is mandatory.
- Native Postgres password roles — for partner systems that can't handle OAuth rotation. UI-generated high-entropy passwords; "rotate on a regular schedule" recommended.
TLS chain¶
Lakebase uses Let's Encrypt for TLS certificates. Partner systems requiring a CA certificate (like SAP Engagement Cloud) must configure ISRG Root X1 as the trust anchor.
Customer disclosed¶
- Deichmann — European footwear retailer; the customer story behind this technical post; uses Lakebase + SAP Engagement Cloud for omnichannel marketing.
Operational numbers¶
- Compute sizing: 0 → 16 CU (~32 GB RAM) for a marketing workload.
- Scale-down speed: "in less than a second" (carry-over framing from earlier Lakebase posts).
- Snapshot mode performance multiplier: 10× vs triggered mode when >10% of data changes per cycle.
- work_mem suggestion: 256 MB on larger compute.
- OAuth token rotation: hourly (the constraint that forces native-password roles for non-Databricks-aware partners).
Caveats¶
-
Tier-3 product post. The body is ~70% integration tutorial (SAP Engagement Cloud setup steps) and ~30% architecture. Inclusion is justified by the genuinely-new architectural disclosures (sync modes with quantified tradeoff, LFC + PREFETCH + FILECACHE, Lakehouse Sync pipeline) — but readers should know most of the body is click-through documentation, not architecture writing.
-
No production scale numbers from Deichmann. The customer is named but no transaction volumes, concurrency numbers, segment sizes, or campaign cadences are disclosed. The 10× snapshot-vs-triggered claim is stated without methodology.
-
No internals on LFC. PREFETCH and FILECACHE are named as metrics but the cache structure (eviction policy, sizing, pre-warming behaviour, coherence semantics relative to Pageserver) is not disclosed. The wiki page captures only what was disclosed.
-
Lakehouse Sync mechanics opaque. The pipeline is described as "native, continuous CDC-based" but the actual change-capture mechanism (logical replication? WAL streaming? Pageserver-side reads?) is not disclosed. Likely uses Postgres logical decoding given the Postgres-compatibility constraint, but this is not confirmed.
-
Sync mode tradeoffs partially asymmetric. The 10× claim is for snapshot vs triggered when >10% data changes; the post does not quantify the inverse case (when <10% changes, how much faster is triggered than snapshot? How much faster is continuous than triggered? What's the operational latency floor of continuous mode?). The decision rule is given as a single threshold rather than a continuous tradeoff curve.
-
Native password role security tradeoff downplayed. The recommendation to "rotate passwords by generating new ones on a regular schedule" shifts the OAuth-style automatic rotation to operational discipline. The post doesn't disclose what "regular" means in practice or how to integrate password rotation with secret managers.
-
Postgres tuning advice is generic. The
work_memandautovacuumtuning suggestions are standard Postgres practice and would apply to any Postgres deployment. They're called out here mostly to confirm that Lakebase doesn't break the standard tuning surface. -
Customer story PR-shaped framing. Phrases like "Ready to modernize your marketing stack?" and the pricing-model framing ("Lakebase customers pay only for the resources they need") are vendor positioning, not architecture. The bursty-workload framing is architecturally real, but the conclusion is a CTA.
-
No comparison to coupled-compute Postgres for the bursty case. The post asserts Lakebase costs are lower for bursty workloads but doesn't quantify the delta vs a sized-for-peak coupled Postgres or vs a provisioned-Aurora-Serverless v2 alternative.
-
Synced Tables snapshot mode is one-way. The post doesn't mention conflict handling for the Synced Tables direction (Delta → Lakebase). The implicit assumption is that Lakebase rows for synced tables are read-only from the application's perspective; this is consistent with the marketing-campaign use case (segments are computed analytically, then read from OLTP) but worth noting as a constraint.
Source¶
- Original: https://www.databricks.com/blog/unlock-seamless-and-cost-effective-marketing-campaigns-lakebase
- Raw markdown:
raw/databricks/2026-05-20-unlock-seamless-and-cost-effective-marketing-campaigns-with-beab8dac.md - Customer story: Deichmann × Lakebase
- Lakebase product page: databricks.com/product/lakebase
Related¶
- systems/lakebase — the OLTP system itself; this post adds Synced Tables coverage, Lakehouse Sync pipeline, LFC metrics, and the marketing-campaigns workload as a canonical fit for serverless OLTP.
- systems/lakebase-synced-tables — newly canonicalised as a first-class system page; three sync modes + 10× snapshot rule of thumb.
- systems/lakehouse-sync — newly canonicalised; CDC pipeline from Lakebase Postgres → Unity Catalog Delta.
- concepts/lakebase-local-file-cache — newly canonicalised; LFC + PREFETCH + FILECACHE.
- concepts/scale-to-zero + concepts/bursty-query-pattern — the marketing-campaign workload as canonical instance.
- concepts/change-data-capture — Lakehouse Sync is a named instance.
- patterns/snapshot-sync-mode-for-batch-rebuild — the
10% rule of thumb.
- patterns/native-postgres-roles-for-non-databricks-aware-partners — OAuth-rotation-incompatibility escape hatch.
- companies/databricks — recent articles index entry.