CONCEPT Cited by 3 sources

ELT vs ETL¶

ETL (Extract → Transform → Load) transforms data before loading it into the warehouse, usually in an external worker. ELT (Extract → Load → Transform) lands raw data into the warehouse first and runs transformations inside it, as SQL over warehouse compute.

The flip is powered by OLAP warehouses with separate, elastic compute: once the engine can aggregate billions of rows in minutes (see concepts/compute-storage-separation), pulling data out to transform in application code becomes the slow path.

Why ELT wins for aggregation-heavy pipelines¶

Compute is where the data is. Scan + GROUP BY + JOIN stays in the warehouse engine; no per-row wire traffic to an external worker.
Intermediate stages become SQL views, not persisted tables. Canva explicitly called this out: intermediate DBT transformation outputs materialised as SQL Views that are "simple and flexible to change", eliminating multiple persisted intermediary tables. (Source: sources/2024-04-29-canva-scaling-to-count-billions)
Code shrinks. Canva removed "thousands of lines of deduplication and aggregation calculation code" by rewriting the logic in SQL, which was "surprisingly simpler" than the imperative equivalent.
End-to-end recompute becomes cheap. Reruns are just re-executing the DBT graph; see patterns/end-to-end-recompute.

Typical ELT stack¶

Extract + Load: a managed replication pipeline from source databases into the warehouse. Canva uses their data-platform team's replication service to pull from DynamoDB + service DBs into Snowflake. Raw JSON fields get projected into typed columns as part of this step for query performance.
Transform: SQL models organised as a DAG, authored in a tool like DBT (systems/dbt). Each model selects from upstream models and emits the next; the final models are the numbers you actually serve.
Warehouse: OLAP engine that can scale compute for the transform runs — Snowflake, BigQuery, Redshift, Databricks SQL.

Costs / caveats¶

Separate deploy cadence. The DBT codebase at Canva is a standalone service with its own CI/CD; schema changes require compatibility reasoning across two release schedules. (Source: sources/2024-04-29-canva-scaling-to-count-billions)
Different observability tooling. The warehouse + DBT stack has its own metrics/logs story, adding integration cost.
Still needs a bridge to OLTP for serving. Warehouse query latency isn't millisecond-class; low-latency reads require exporting the derived data back out. See patterns/warehouse-unload-bridge.
Upstream schema coupling. SQL models depend on the shapes of the extracted tables; source schema changes ripple through the DAG.

Seen in¶

— canonical wiki framing of ETL as the PlanetScale-recommended OLAP-offload mechanism instead of running analytical workloads on the OLTP engine (HTAP) or tuning OLTP for OLAP. Savannah Longoria (PlanetScale, 2023-12-01) canonicalises verbatim: "For large ETL workloads, we support and recommend data integration engines such as Airbyte, Fivetran, and Stitch, with which you can offload these processes to other platforms that are more specialized in OLAP workloads." Names Airbyte + Fivetran + Stitch as the three canonical integration-engine options; the three are framed as interchangeable for PlanetScale's purposes. Composes with the post's physical resource isolation principle — ETL to a separate warehouse is the operational instantiation of not sharing physical resources across OLTP and OLAP workloads.
sources/2024-04-29-canva-scaling-to-count-billions — Canva moves dedup + aggregation into Snowflake + DBT SQL models; billions of records aggregated in minutes; >50% storage reduction.
sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck — sibling architectural answer to the same per-record round-trip trap, solving it without moving to a warehouse: keep the transform in the application layer (Java + MongoDB Atlas) but collapse the round-trips via native bulkWrite + prefetch + parallel execution (patterns/bulk-write-batch-optimization). When the workload is per-record enrichment with side effects rather than large-scale aggregation, ETL-via-bulk-operations wins; when it's aggregation-heavy, ELT-in-warehouse wins.

concepts/oltp-vs-olap
concepts/compute-storage-separation
concepts/network-round-trip-cost
patterns/end-to-end-recompute
patterns/warehouse-unload-bridge
patterns/bulk-write-batch-optimization
systems/dbt, systems/snowflake
sources/2025-01-21-redpanda-implementing-the-medallion-architecture-with-redpanda — ELT as the canonical transformation model between Medallion Architecture tiers. Redpanda's pedagogy explainer walks the standard Bronze→Silver dbt SQL transformation scheduled by Apache Airflow (filter inactive sensors + temperature outliers + null humidity, derive a temperature_status column) as the default lakehouse ELT shape — transforms happen inside the warehouse/lakehouse engine (Flink, Spark, Trino, Snowflake, Databricks) via SQL over Iceberg tables, not in external Python workers. Then canonicalises the streaming-ETL alternative when latency matters: replace scheduled batch dbt runs between Medallion tiers with continuous Flink jobs via Iceberg's Flink connector — the stream-processor-for-real-time-medallion-transitions pattern. Canonical business framing of batch-ELT's latency cost: "From a business perspective, the only drawback of this 'multi-hop' architecture is the increased time to value, as it takes time for the data to traverse several layers."

ELT vs ETL¶

Why ELT wins for aggregation-heavy pipelines¶

Typical ELT stack¶

Costs / caveats¶

Seen in¶

Related¶