SYSTEM Cited by 2 sources
Lakeflow Spark Declarative Pipelines (SDP)¶
Lakeflow Spark Declarative Pipelines (SDP) is Databricks'
declarative ingestion-to-features layer on top of
Spark. Pipelines are authored in Python
(the pyspark.pipelines module, imported as dp) using two
decorators:
@dp.table— declares a streaming table (bronze-tier ingest, continuously fed).@dp.materialized_view— declares a materialised view (a derived aggregate or transformation kept in sync with upstream tables).
The declarative model handles schema evolution, late- arriving events, and continuous aggregation so pipeline authors describe what the pipeline produces rather than how state is maintained over time.
Stub page. First wiki ingest naming Lakeflow SDP; the ingested source (Databricks multimodal post) uses it as the illustrative wearables-streaming tool inside the governed-Delta-tables-per- modality pattern.
Role in multimodal lakehouse architecture¶
"Wearables streams introduce operational requirements: schema evolution, late-arriving events, and continuous aggregation. Lakeflow Spark Declarative Pipelines (SDP) provides a robust ingestion-to-features pattern for streaming tables and materialized views." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai)
Key properties:
- Declarative. Author describes outputs + their input dependencies; the runtime chooses incremental recompute vs full rebuild.
- Streaming-native.
@dp.tableon a streaming source yields a continuously-updated Delta table. - Materialised-view semantics.
@dp.materialized_viewoutputs reflect upstream changes without a manual refresh. - Schema evolution + late events handled by the pipeline runtime, not the pipeline author.
Why it matters for sysdesign¶
Lakeflow SDP is the streaming-side complement to Delta Lake's batch shape: both present a table interface, both are UC-governed, both inherit the lakehouse's reproducibility story (time travel, lineage, MLflow integration). It lets the governed-Delta- tables-per-modality pattern cover streaming modalities (wearables, IoT, clickstreams) without introducing a separate stream-processing tier.
Syntax note from the source¶
"The
pyspark.pipelinesmodule (imported asdp) with@dp.tableand@dp.materialized_viewdecorators follows current Databricks Lakeflow SDP Python semantics." (Source: sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai)
The explicit currency disclaimer — "current" Python semantics — flags that Databricks' decorator vocabulary has evolved and may evolve again.
Seen in¶
- sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai — Databricks names Lakeflow SDP as the wearables-streaming tool inside its multimodal lakehouse pattern; cited for schema evolution + late-event handling + continuous aggregation over wearables streams. First wiki ingest naming Lakeflow SDP.
- sources/2026-04-22-databricks-stop-hand-coding-change-data-capture-pipelines
— SDP as the runtime host for
AutoCDC, Databricks' declarative CDC / SCD API. Second wiki
ingest naming Lakeflow SDP; canonicalises the runtime's
load-bearing correctness properties that AutoCDC inherits:
incremental-progress tracking, out-of-sequence arrival handling,
reprocessing safety, schema evolution, failure recovery without
lost or doubled changes — "Lakeflow Spark Declarative Pipelines
automatically tracks incremental progress and handles
out-of-sequence data. Pipelines can recover from failures,
reprocess historical data, and evolve over time without
double-applying or losing changes." The AutoCDC API adds
CDC/SCD-specific authoring surface (
dp.create_auto_cdc_flowwithkeys,sequence_by,apply_as_deletes,stored_as_scd_typeparameters) atop the SDP runtime's general streaming guarantees. Composes with the@dp.view,@dp.table,@dp.materialized_view,dp.create_streaming_tableprimitives disclosed in the multimodal post. Perf gains disclosed as Databricks Runtime improvements since Nov 2025: 71% better perf-per-dollar on SCD Type 1, 96% on SCD Type 2 workloads — propagated universally to AutoCDC pipelines because the declarative API lets engine-level optimisations apply to every AutoCDC flow without author intervention. Named regulated-vertical adopters at production scale: Navy Federal Credit Union, Block, Valora Group. First wiki source to canonicalise SDP's CDC/SCD API surface (distinct from the streaming-wearables role in the prior source).