CONCEPT Cited by 1 source
Full-dump vs delta vs target (tri-layer CDC schema)¶
Definition¶
The tri-layer CDC schema decomposes a CDC ingestion job's data into three internal tables, each with a distinct role:
| Table | Role | Update cadence |
|---|---|---|
| Full-dump table | Periodic full snapshot of the source database | Periodic (slow + expensive) |
| Delta table | Captures source changes (insert / update / delete events) | Continuous (per-source-change) |
| Target table | Consumer-visible state = full-dump + applied deltas | Materialised at query time or on a fixed cadence |
"Each data ingestion job has its own internal table for a full dump of source databases (full dump), an internal table for capturing changes of source databases (delta), and the target table consumed by the data customers. All the information about job entities, including table names and table schemas, is saved and managed by the central management service." — Source: sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale
Why three tables, not two¶
Many CDC pipelines store only two: deltas (the change stream) + target (the materialised current state). The third — the full- dump table — is what bounds delta-replay cost. Without a periodic full-dump anchor, reconstructing target state requires replaying deltas from the beginning of time; with one, only deltas since the last full-dump need to be applied.
Concretely, the target table is computed as:
target = (latest full-dump) + (deltas applied after the full-dump's snapshot timestamp)
The frequency of the full-dump is a tradeoff:
- More frequent full-dumps → smaller delta-replay window → faster target-table materialisation, but higher full-dump cost (slow + expensive scans of the source).
- Less frequent full-dumps → larger delta-replay window → cheaper full-dump cost, but slower target-table materialisation + more delta-storage overhead.
Full-dumps are slow + expensive — and migrations multiply them¶
Meta's post explicitly names the full-dump cost as the load-bearing operational tax during a CDC system migration:
"Due to the system's CDC design, a new job's first snapshot was landed via a full dump, which is typically slow and expensive. If we detected data quality issues in a landed snapshot we also triggered another full dump to land a corrected snapshot after the underlying bugs were fixed."
Two distinct events trigger full-dumps during migration:
- First snapshot of a new job (one full-dump per migrated job).
- Data-quality remediation if the first snapshot has issues that get fixed (another full-dump per affected job).
Meta's response: patterns/snapshot-reuse-from-legacy-during-migration — reuse the legacy system's snapshot output as the new system's seed snapshot, avoiding the first full-dump entirely.
Distinguishing from related schemas¶
- vs dual-table CDC (delta + target only): simpler but no full-dump anchor, so delta-replay cost grows with time-since- start. Many warehouse-side CDC pipelines start dual-table and add periodic full-dumps as an optimisation later.
- vs log-only CDC (e.g. Kafka topics holding the change log): the change log is the delta layer; target reconstruction is pushed entirely to the consumer side. The tri-layer schema pre-materialises the target inside the ingestion system.
- vs snapshot-only ingestion (no deltas): every load is a full-dump; freshness limited by full-dump cadence; no intermediate state visible to consumers.
Seen in¶
- sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale — Meta's data-ingestion-system architecture; canonical wiki statement of the tri-layer schema.
Related¶
- concepts/change-data-capture — the substrate primitive
- concepts/cdc-bad-data-propagation — the hazard this schema is subject to
- concepts/snapshot-isolation — the concept the full-dump table provides at coarse grain
- patterns/snapshot-reuse-from-legacy-during-migration — the migration optimisation that bypasses the full-dump cost
- systems/meta-data-ingestion-system — canonical wiki instance
- companies/meta — company hub