CONCEPT Cited by 2 sources
Slowly-Changing Dimension (SCD)¶
Slowly-Changing Dimension (SCD) is the dimensional-modeling pattern for dimension tables whose attributes change over time — customer address, product category, employee department — at a rate much lower than the fact tables that reference them, but not zero. SCD policies define how those changes are recorded: overwrite, add a new row, or track old and new side by side.
The classic SCD types¶
- SCD Type 1 — Overwrite. Mutate the existing row in place; no history kept. Cheapest, but destroys audit trail.
- SCD Type 2 — Row versioning. Close the current row (set
valid_to,is_current = false) and insert a new row for the new value (withvalid_from= change timestamp,is_current = true). Preserves full history; joins against facts need an as-of-timestamp predicate. - SCD Type 3 — Limited history. Keep
previous_valuecolumns alongsidecurrent_value. Fixed-width history; simple but lossy.
Why SCD appears in this wiki¶
SCD is one of the canonical incremental-update workloads that
motivated row-level semantics on open table formats (concepts/open-table-format).
The natural SQL expression of an SCD Type 1 or Type 2 update is a
MERGE INTO statement:
MERGE INTO customer_dim t
USING staging_customer_updates s
ON t.customer_id = s.customer_id
WHEN MATCHED AND t.address <> s.address THEN
UPDATE SET address = s.address, updated_at = s.event_ts
WHEN NOT MATCHED THEN
INSERT *;
On systems/apache-iceberg and its peers, this is a targeted
row-level update, not a partition rewrite. Executing it as
INSERT OVERWRITE on the dimension table would rewrite the whole
partition on every small batch of updates — prohibitive on anything
larger than toy scale. See
patterns/merge-into-over-insert-overwrite for the operational
prescription. (Source: sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite)
Relationship to CDC¶
CDC is the ingest shape — a stream of insert/update/delete deltas from a source database. SCD is the merge policy — how those deltas are resolved into the warehouse dimension. The two compose: a CDC stream of customer updates lands in a staging table; a MERGE statement applies the staging deltas to the customer dimension under an SCD Type 1 or Type 2 policy; the resulting dimension is a row-level-updated Iceberg table best stored under MOR with periodic copy-on-write compaction.
Why it's a "slowly" changing dimension¶
The name is warehouse-era — dimensions change slowly relative to
facts. Operationally this doesn't relax the correctness requirements;
it just means the workload shape is many small batches of row-level
updates, not few big partition rewrites, which is precisely why
MERGE INTO (not INSERT OVERWRITE) is the right SQL surface on
Iceberg. (Source: sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite)
Seen in¶
-
sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite — Expedia names SCD as one of the two canonical use cases (alongside CDC) that make
MERGE INTOthe right default on Iceberg. -
sources/2026-04-22-databricks-stop-hand-coding-change-data-capture-pipelines — SCD Types 1 and 2 as declarative API parameters, not hand-authored
MERGEscaffolding. Databricks' 2026-04-22 post canonicalises AutoCDC'sstored_as_scd_type=1/stored_as_scd_type=2flag as the entire surface for SCD-type selection; the runtime manages__START_AT/__END_ATvalidity-window columns, closes out active rows on updates, ensures exactly one__END_AT IS NULLrow per key at any time, and applies deletes correctly across both the current-state and history tables. Before/after comparison: hand-rolled SCD Type 2 takes two steps (close-out + insert-new) plusmax_bydedup — ~25 lines — while AutoCDC version is ~7 lines with a single parameter change. Out-of-order arrivals are handled automatically viasequence_by(concepts/out-of-sequence-cdc-event-handling); snapshot-source CDC is a first-class input mode so SCD-on-snapshots works without hand-rolled diff logic. Output-table examples in the source post show canonical SCD Type 1 (user 125 keeps latest Guadalajara only, deleted user 123 absent) and SCD Type 2 (user 125 has three ordered versions Tijuana → Mexicali → Guadalajara with correct validity windows despite out-of-order arrival). Reinforces the Expedia-canonicalised patterns/merge-into-over-insert-overwrite pattern as the underlying MERGE primitive, while adding the declarative authoring surface that subsumes hand-rolled SCD pipelines into the new patterns/declarative-cdc-over-hand-rolled-merge pattern. Regulated-vertical adopters (Navy Federal Credit Union, Block, Valora Group) use AutoCDC for SCD at production scale — billions of events/day in the NFCU case.