Skip to content

SYSTEM Cited by 1 source

DBT (data build tool)

DBT (data build tool) is the transformation layer of modern ELT pipelines. Engineers write SQL SELECT queries as "models"; DBT resolves the DAG between them via {{ ref('upstream_model') }} references, compiles and executes the queries against the warehouse, and materialises outputs as tables, views, or incremental tables. Documentation at https://docs.getdbt.com/docs/introduction.

Why it appears in architectures

  • It's the T in ELT — data lands in the warehouse raw, and DBT models push it through stages. Canva's counting pipeline defines deduplication, classification, and aggregation as DBT models on Snowflake; intermediate stages materialise as SQL Views. (Source: sources/2024-04-29-canva-scaling-to-count-billions)
  • Lineage + DAG reasoning comes for free via ref; pipeline stages aren't just "whatever SQL is in cron" — they have dependencies DBT understands.
  • Tests and schema contracts live alongside the models; migration hazards from upstream schema changes are visible.

Example (from Canva)

-- aggregate daily template usage up to per-brand counts
select
  day_id,
  template_brand,
  ...
  sum(usage_count) as usage_count
from {{ ref('daily_template_usages') }}
group by
  day_id,
  template_brand

The previous-stage model daily_template_usages is resolved by DBT; this model is what a downstream brand-level report would ref.

Operational properties

  • Separate deployable with its own CI/CD. Canva explicitly flags that the DBT codebase is a standalone service with a different release cadence from the main services, and schema-change compatibility has to be reasoned about across the two.
  • Materialisation choice matters. Views = cheap, always fresh, recomputed at read. Tables = cached, fast to read, need refresh. Incremental = append / merge for large tables with stable keys. Canva prefers views for intermediates: "simple and flexible to change" and no intermediary persisted state.
  • Rerun semantics. A full DBT run re-executes the DAG — which is what enables patterns/end-to-end-recompute when paired with an outer-join upsert at the sink.

Caveats

  • Observability lives in DBT + warehouse tooling, not the service stack — integration cost is real.
  • Source-schema coupling. Models are tied to upstream table shapes; source schema changes ripple through. Canva lists this as a challenge.

Seen in

Last updated · 200 distilled / 1,178 read