CONCEPT Cited by 1 source

Full-dump vs delta vs target (tri-layer CDC schema)¶

Definition¶

The tri-layer CDC schema decomposes a CDC ingestion job's data into three internal tables, each with a distinct role:

Table	Role	Update cadence
Full-dump table	Periodic full snapshot of the source database	Periodic (slow + expensive)
Delta table	Captures source changes (insert / update / delete events)	Continuous (per-source-change)
Target table	Consumer-visible state = full-dump + applied deltas	Materialised at query time or on a fixed cadence

"Each data ingestion job has its own internal table for a full dump of source databases (full dump), an internal table for capturing changes of source databases (delta), and the target table consumed by the data customers. All the information about job entities, including table names and table schemas, is saved and managed by the central management service." — Source: sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale

Why three tables, not two¶

Many CDC pipelines store only two: deltas (the change stream) + target (the materialised current state). The third — the full- dump table — is what bounds delta-replay cost. Without a periodic full-dump anchor, reconstructing target state requires replaying deltas from the beginning of time; with one, only deltas since the last full-dump need to be applied.

Concretely, the target table is computed as:

target = (latest full-dump) + (deltas applied after the full-dump's snapshot timestamp)

The frequency of the full-dump is a tradeoff:

More frequent full-dumps → smaller delta-replay window → faster target-table materialisation, but higher full-dump cost (slow + expensive scans of the source).
Less frequent full-dumps → larger delta-replay window → cheaper full-dump cost, but slower target-table materialisation + more delta-storage overhead.

Full-dumps are slow + expensive — and migrations multiply them¶

Meta's post explicitly names the full-dump cost as the load-bearing operational tax during a CDC system migration:

"Due to the system's CDC design, a new job's first snapshot was landed via a full dump, which is typically slow and expensive. If we detected data quality issues in a landed snapshot we also triggered another full dump to land a corrected snapshot after the underlying bugs were fixed."

Two distinct events trigger full-dumps during migration:

First snapshot of a new job (one full-dump per migrated job).
Data-quality remediation if the first snapshot has issues that get fixed (another full-dump per affected job).

Meta's response: patterns/snapshot-reuse-from-legacy-during-migration — reuse the legacy system's snapshot output as the new system's seed snapshot, avoiding the first full-dump entirely.

vs dual-table CDC (delta + target only): simpler but no full-dump anchor, so delta-replay cost grows with time-since- start. Many warehouse-side CDC pipelines start dual-table and add periodic full-dumps as an optimisation later.
vs log-only CDC (e.g. Kafka topics holding the change log): the change log is the delta layer; target reconstruction is pushed entirely to the consumer side. The tri-layer schema pre-materialises the target inside the ingestion system.
vs snapshot-only ingestion (no deltas): every load is a full-dump; freshness limited by full-dump cadence; no intermediate state visible to consumers.

Seen in¶

sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale — Meta's data-ingestion-system architecture; canonical wiki statement of the tri-layer schema.

concepts/change-data-capture — the substrate primitive
concepts/cdc-bad-data-propagation — the hazard this schema is subject to
concepts/snapshot-isolation — the concept the full-dump table provides at coarse grain
patterns/snapshot-reuse-from-legacy-during-migration — the migration optimisation that bypasses the full-dump cost
systems/meta-data-ingestion-system — canonical wiki instance
companies/meta — company hub

Full-dump vs delta vs target (tri-layer CDC schema)¶

Definition¶

Why three tables, not two¶

Full-dumps are slow + expensive — and migrations multiply them¶

Distinguishing from related schemas¶

Seen in¶

Related¶