CONCEPT Cited by 1 source
Slowly-Changing Dimension (SCD)¶
Slowly-Changing Dimension (SCD) is the dimensional-modeling pattern for dimension tables whose attributes change over time — customer address, product category, employee department — at a rate much lower than the fact tables that reference them, but not zero. SCD policies define how those changes are recorded: overwrite, add a new row, or track old and new side by side.
The classic SCD types¶
- SCD Type 1 — Overwrite. Mutate the existing row in place; no history kept. Cheapest, but destroys audit trail.
- SCD Type 2 — Row versioning. Close the current row (set
valid_to,is_current = false) and insert a new row for the new value (withvalid_from= change timestamp,is_current = true). Preserves full history; joins against facts need an as-of-timestamp predicate. - SCD Type 3 — Limited history. Keep
previous_valuecolumns alongsidecurrent_value. Fixed-width history; simple but lossy.
Why SCD appears in this wiki¶
SCD is one of the canonical incremental-update workloads that
motivated row-level semantics on open table formats (concepts/open-table-format).
The natural SQL expression of an SCD Type 1 or Type 2 update is a
MERGE INTO statement:
MERGE INTO customer_dim t
USING staging_customer_updates s
ON t.customer_id = s.customer_id
WHEN MATCHED AND t.address <> s.address THEN
UPDATE SET address = s.address, updated_at = s.event_ts
WHEN NOT MATCHED THEN
INSERT *;
On systems/apache-iceberg and its peers, this is a targeted
row-level update, not a partition rewrite. Executing it as
INSERT OVERWRITE on the dimension table would rewrite the whole
partition on every small batch of updates — prohibitive on anything
larger than toy scale. See
patterns/merge-into-over-insert-overwrite for the operational
prescription. (Source: sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite)
Relationship to CDC¶
CDC is the ingest shape — a stream of insert/update/delete deltas from a source database. SCD is the merge policy — how those deltas are resolved into the warehouse dimension. The two compose: a CDC stream of customer updates lands in a staging table; a MERGE statement applies the staging deltas to the customer dimension under an SCD Type 1 or Type 2 policy; the resulting dimension is a row-level-updated Iceberg table best stored under MOR with periodic copy-on-write compaction.
Why it's a "slowly" changing dimension¶
The name is warehouse-era — dimensions change slowly relative to
facts. Operationally this doesn't relax the correctness requirements;
it just means the workload shape is many small batches of row-level
updates, not few big partition rewrites, which is precisely why
MERGE INTO (not INSERT OVERWRITE) is the right SQL surface on
Iceberg. (Source: sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite)
Seen in¶
- sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite
— Expedia names SCD as one of the two canonical use cases (alongside
CDC) that make
MERGE INTOthe right default on Iceberg.