SYSTEM Cited by 1 source

DBT (data build tool)¶

DBT (data build tool) is the transformation layer of modern ELT pipelines. Engineers write SQL SELECT queries as "models"; DBT resolves the DAG between them via {{ ref('upstream_model') }} references, compiles and executes the queries against the warehouse, and materialises outputs as tables, views, or incremental tables. Documentation at https://docs.getdbt.com/docs/introduction.

Why it appears in architectures¶

It's the T in ELT — data lands in the warehouse raw, and DBT models push it through stages. Canva's counting pipeline defines deduplication, classification, and aggregation as DBT models on Snowflake; intermediate stages materialise as SQL Views. (Source: sources/2024-04-29-canva-scaling-to-count-billions)
Lineage + DAG reasoning comes for free via ref; pipeline stages aren't just "whatever SQL is in cron" — they have dependencies DBT understands.
Tests and schema contracts live alongside the models; migration hazards from upstream schema changes are visible.

Example (from Canva)¶

-- aggregate daily template usage up to per-brand counts
select
  day_id,
  template_brand,
  ...
  sum(usage_count) as usage_count
from {{ ref('daily_template_usages') }}
group by
  day_id,
  template_brand

The previous-stage model daily_template_usages is resolved by DBT; this model is what a downstream brand-level report would ref.

Operational properties¶

Separate deployable with its own CI/CD. Canva explicitly flags that the DBT codebase is a standalone service with a different release cadence from the main services, and schema-change compatibility has to be reasoned about across the two.
Materialisation choice matters. Views = cheap, always fresh, recomputed at read. Tables = cached, fast to read, need refresh. Incremental = append / merge for large tables with stable keys. Canva prefers views for intermediates: "simple and flexible to change" and no intermediary persisted state.
Rerun semantics. A full DBT run re-executes the DAG — which is what enables patterns/end-to-end-recompute when paired with an outer-join upsert at the sink.

Caveats¶

Observability lives in DBT + warehouse tooling, not the service stack — integration cost is real.
Source-schema coupling. Models are tied to upstream table shapes; source schema changes ripple through. Canva lists this as a challenge.

Seen in¶

sources/2024-04-29-canva-scaling-to-count-billions — Canva's counting pipeline is DBT models on Snowflake; dedup + aggregation as SQL; intermediate stages as views; "thousands of lines" of imperative code deleted.