Skip to content

CONCEPT Cited by 2 sources

Slowly-Changing Dimension (SCD)

Slowly-Changing Dimension (SCD) is the dimensional-modeling pattern for dimension tables whose attributes change over time — customer address, product category, employee department — at a rate much lower than the fact tables that reference them, but not zero. SCD policies define how those changes are recorded: overwrite, add a new row, or track old and new side by side.

The classic SCD types

  • SCD Type 1 — Overwrite. Mutate the existing row in place; no history kept. Cheapest, but destroys audit trail.
  • SCD Type 2 — Row versioning. Close the current row (set valid_to, is_current = false) and insert a new row for the new value (with valid_from = change timestamp, is_current = true). Preserves full history; joins against facts need an as-of-timestamp predicate.
  • SCD Type 3 — Limited history. Keep previous_value columns alongside current_value. Fixed-width history; simple but lossy.

Why SCD appears in this wiki

SCD is one of the canonical incremental-update workloads that motivated row-level semantics on open table formats (concepts/open-table-format). The natural SQL expression of an SCD Type 1 or Type 2 update is a MERGE INTO statement:

MERGE INTO customer_dim t
USING staging_customer_updates s
ON t.customer_id = s.customer_id
WHEN MATCHED AND t.address <> s.address THEN
  UPDATE SET address = s.address, updated_at = s.event_ts
WHEN NOT MATCHED THEN
  INSERT *;

On systems/apache-iceberg and its peers, this is a targeted row-level update, not a partition rewrite. Executing it as INSERT OVERWRITE on the dimension table would rewrite the whole partition on every small batch of updates — prohibitive on anything larger than toy scale. See patterns/merge-into-over-insert-overwrite for the operational prescription. (Source: sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite)

Relationship to CDC

CDC is the ingest shape — a stream of insert/update/delete deltas from a source database. SCD is the merge policy — how those deltas are resolved into the warehouse dimension. The two compose: a CDC stream of customer updates lands in a staging table; a MERGE statement applies the staging deltas to the customer dimension under an SCD Type 1 or Type 2 policy; the resulting dimension is a row-level-updated Iceberg table best stored under MOR with periodic copy-on-write compaction.

Why it's a "slowly" changing dimension

The name is warehouse-era — dimensions change slowly relative to facts. Operationally this doesn't relax the correctness requirements; it just means the workload shape is many small batches of row-level updates, not few big partition rewrites, which is precisely why MERGE INTO (not INSERT OVERWRITE) is the right SQL surface on Iceberg. (Source: sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite)

Seen in

  • sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite — Expedia names SCD as one of the two canonical use cases (alongside CDC) that make MERGE INTO the right default on Iceberg.

  • sources/2026-04-22-databricks-stop-hand-coding-change-data-capture-pipelinesSCD Types 1 and 2 as declarative API parameters, not hand-authored MERGE scaffolding. Databricks' 2026-04-22 post canonicalises AutoCDC's stored_as_scd_type=1 / stored_as_scd_type=2 flag as the entire surface for SCD-type selection; the runtime manages __START_AT / __END_AT validity-window columns, closes out active rows on updates, ensures exactly one __END_AT IS NULL row per key at any time, and applies deletes correctly across both the current-state and history tables. Before/after comparison: hand-rolled SCD Type 2 takes two steps (close-out + insert-new) plus max_by dedup — ~25 lines — while AutoCDC version is ~7 lines with a single parameter change. Out-of-order arrivals are handled automatically via sequence_by (concepts/out-of-sequence-cdc-event-handling); snapshot-source CDC is a first-class input mode so SCD-on-snapshots works without hand-rolled diff logic. Output-table examples in the source post show canonical SCD Type 1 (user 125 keeps latest Guadalajara only, deleted user 123 absent) and SCD Type 2 (user 125 has three ordered versions Tijuana → Mexicali → Guadalajara with correct validity windows despite out-of-order arrival). Reinforces the Expedia-canonicalised patterns/merge-into-over-insert-overwrite pattern as the underlying MERGE primitive, while adding the declarative authoring surface that subsumes hand-rolled SCD pipelines into the new patterns/declarative-cdc-over-hand-rolled-merge pattern. Regulated-vertical adopters (Navy Federal Credit Union, Block, Valora Group) use AutoCDC for SCD at production scale — billions of events/day in the NFCU case.

Last updated · 542 distilled / 1,571 read