PATTERN Cited by 1 source
Snapshot sync mode for batch rebuild¶
Pattern. When a managed sync pipeline offers both snapshot (full replace) and triggered (incremental upsert) modes, choose snapshot mode whenever the per-cycle delta exceeds ~10% of the upstream table — even though snapshot rewrites everything, because the bulk-copy path is up to 10× faster than per-row incremental upsert at high delta proportions.
The pattern is canonicalised from the 2026-05-20 Databricks marketing-campaigns post, applied to Lakebase Synced Tables:
"When more than 10% of the data is updated, we recommend snapshot mode, which delivers 10x better performance than triggered mode."
When to apply¶
The decision variable is the delta proportion per sync cycle, not the cadence:
| Workload shape | Per-cycle delta | Best mode |
|---|---|---|
| Nightly customer-segment recompute (replaces most rows) | >10% | Snapshot |
| Ad-hoc dictionary update (a few new categories) | <10% | Triggered |
| Real-time event stream (small rapid changes) | <10% but latency-critical | Continuous |
Cadence is irrelevant to the choice — a daily nightly recompute that replaces 80% of segments still benefits from snapshot mode even though daily isn't real-time.
Why snapshot beats triggered at high deltas¶
The intuition is counterintuitive: snapshot rewrites the entire table, triggered only updates changed rows. Surely triggered is more efficient?
The crossover happens because:
- Triggered pays a per-row diff/merge cost. Each changed row goes through a primary-key lookup, a conflict check, and an upsert. This cost scales linearly with delta size, with significant per-row overhead.
- Snapshot is a bulk copy. Storage-compute-separated systems like Lakebase can stream a full table-load efficiently from object storage (where the snapshot reads from) without per-row conflict resolution.
For small deltas, the bulk-copy cost dominates and triggered wins. For large deltas, the per-row diff cost dominates and snapshot wins. The ~10% threshold is where the lines cross on the disclosed Lakebase Synced Tables implementation.
Canonical use case: marketing-campaign customer segments¶
From the 2026-05-20 post:
"in our case, and very often, customer segments are recomputed nightly in batch, replacing a significant portion of the dataset. When more than 10% of the data is updated, we recommend snapshot mode."
Customer segmentation pipelines typically:
- Ingest user behaviour data over a 24-hour window.
- Recompute every active segment from scratch (because criteria may overlap, segments aren't naturally incremental).
- Replace the segment-membership table with the new computation output.
The result: nearly 100% of rows change every night. Triggered mode would do per-row upserts on millions of rows, paying the full per-row tax. Snapshot mode bulk-copies the new table in one shot.
Operational implications¶
- Customer makes the call. The post frames this as a recommendation, not an automatic decision. The customer is expected to understand their workload's delta proportion and pick mode accordingly.
- Mode is a per-table configuration. Different synced tables can use different modes; the decision is per-pipeline.
- Cadence is a separate decision. Snapshot mode can be scheduled (nightly batch) or on-demand. The 10% rule determines mode; the freshness requirement determines cadence.
Generalisation beyond Lakebase¶
The pattern generalises wherever a sync pipeline offers a similar bulk-rebuild vs incremental tradeoff:
- Database replication tools with both full-load and incremental-load modes.
- Materialized view refresh strategies (full vs incremental).
- Search index rebuilds vs incremental updates.
The crossover threshold (here ~10%) is implementation-specific — it depends on the per-row overhead and the bulk-copy efficiency of the specific system. Lakebase's 10% number is the disclosed threshold for Synced Tables specifically; other systems will have different crossover points.
Trade-offs and constraints¶
| Pro | Con |
|---|---|
| 10× faster than triggered at high deltas | Discards triggered's "only update what changed" advantage at low deltas |
| Bulk copy is operationally simpler (no per-row conflict resolution) | Replaces the entire table — temporary disk/network spike during copy |
| Storage-compute separation makes bulk-copy efficient | Not suitable for real-time freshness (latency floor is the snapshot cadence) |
Anti-patterns this avoids¶
- Always-incremental. Triggered mode is appealing because "only update what changed" sounds intuitively cheaper. For workloads that recompute most of the table, this intuition is wrong by 10×.
- Manual sync code. Without managed snapshot mode, the customer would write and maintain a Lakehouse → OLTP refresh pipeline, with all the operational tax that implies.
Seen in¶
- Lakebase Synced Tables for marketing-campaign customer segments (Databricks, 2026-05-20) — the canonical disclosure with the 10% / 10× quantification. (Source: sources/2026-05-20-databricks-marketing-campaigns-with-lakebase)
Related¶
- systems/lakebase-synced-tables — the system that exposes the three sync modes.
- systems/lakebase — the Postgres OLTP host.
- systems/lakehouse-sync — bidirectional companion (Postgres → Delta).
- concepts/change-data-capture — generalisation; snapshot mode is the bulk-rebuild alternative when CDC's incremental shape is inefficient.
- concepts/htap — the architectural shape that makes Lakehouse → OLTP sync necessary in the first place.