CONCEPT Cited by 2 sources
Zero-ETL operational↔analytical¶
Zero-ETL operational↔analytical is the architectural property that operational data (in a transactional store) is available for analytical workloads (in an analytical store / engine) without a customer-managed ETL pipeline in between. First wiki canonicalisation 2026-05-27.
Definition¶
A zero-ETL integration between operational and analytical tiers means:
- No customer-managed pipeline. The customer does not author transform code, configure scheduling, monitor pipeline failures, or pay pipeline-engineer time on the integration.
- Real-time freshness. Data written to the operational tier is visible in the analytical tier with low lag (seconds, not hours).
- No external store of replicated data. The replication is platform-internal; the analytical tier is not a separate warehouse the customer must size and govern.
The 2026-05-27 source verbatim on Moonlink: "a real-time synchronization engine between operational and analytical formats, with zero ETL. This allows FHIR data to flow seamlessly into the analytical layer, eliminating the dependencies for pipelines, transformation, or delays."
What "zero ETL" actually means¶
The phrase is positioning, not literal. Real-world zero-ETL integrations still:
- Have a replication mechanism (CDC, logical decoding, snapshot-and-tail, copy-on-write, etc.).
- Have failure modes (lag, partition tolerance, schema drift handling).
- Have consistency semantics (eventual / monotonic / snapshot isolation).
The "zero" refers to customer-visible operational burden, not to the absence of replication. Specifically:
- No pipeline to author. The customer doesn't write Spark / Flink / dbt / Airflow code.
- No pipeline to schedule. The platform handles cadence.
- No pipeline to monitor. Lag + failure surfaced as platform metrics, not as a customer dashboard the customer maintains.
- No pipeline to upgrade. Schema evolution handled by the platform.
Comparison to neighbouring concepts¶
| Concept | What it is |
|---|---|
| Zero-ETL operational↔analytical (this page) | Property of an integration: no customer-managed pipeline |
| CDC | A specific replication mechanism (read WAL / binlog, apply downstream) |
| HTAP (hybrid transactional + analytical processing) | A property of a single engine that serves both workloads on one storage substrate |
| Compute-storage separation | An engine architecture that lets compute attach/detach from a shared storage layer |
Zero-ETL is closer to HTAP than it might first appear: both are about removing the conventional operational↔analytical replication boundary. The difference is HTAP unifies the engine (one engine serves both) while zero-ETL unifies the integration (two engines, no customer-visible pipeline between them). HTAP is rarer; zero-ETL is the more pragmatic shape for vendors with separate operational and analytical engines they want to bridge.
Wiki-canonical instances¶
- Moonlink (2026-05-27) — Databricks' real-time operational↔analytical sync engine bridging Aidbox-on-Lakebase to the Databricks analytical surface; first wiki disclosure as a named primitive distinct from Synced Tables and Lakehouse Sync.
- Lakebase Synced Tables (2026-05-20) — Delta → Postgres direction with three sync modes (snapshot / triggered / continuous).
- Lakehouse Sync (2026-05-20) — Postgres → Delta CDC-based pipeline.
- Redpanda SQL over Iceberg Topics (2026-05-27 GA) — first wiki canonicalisation where the operational substrate is a streaming broker rather than a transactional database. The "customer-managed pipeline" eliminated isn't a CDC pipeline (since records aren't replicated between tiers — they're simultaneously written to both tiers by the broker); it's the conventional warehouse-side ingestion pipeline that pulls Kafka topics into Snowflake / Databricks / BigQuery for analytical query. Redpanda SQL eliminates that pipeline by running the analytical query engine in-cluster (concepts/in-cluster-streaming-sql) over the dual-tier-write substrate (concepts/two-tier-stream-iceberg-query-bridge). Verbatim positioning: "connect a client, write a query, get results. […] No streaming pipelines to build before the data arrives." The novel substrate property: zero-ETL's "no customer-managed pipeline" property holds even when the operational tier is a streaming broker, not a transactional database.
- AWS Aurora zero-ETL → Redshift (industry, not yet wiki-ingested) — comparable shape from a different vendor.
Why this matters for substrate design¶
In the FHIR-server-on-lakehouse-substrate pattern (Aidbox-on-Lakebase + Moonlink + Unity Catalog), zero-ETL is what makes the dual-access pattern viable as a substrate property rather than as a per-customer integration project. If every customer of a FHIR substrate had to build their own Aidbox→warehouse ETL, the substrate's promise of "one dataset, every tool" would collapse into the conventional three-component pattern with its duplication tax.
Zero-ETL is the load-bearing condition: it's what lets the same FHIR resource be queryable through both the FHIR API and Spark / SQL / ML simultaneously, with neither path requiring a customer-managed replication step.
Seen in¶
- 2026-05-27 — sources/2026-05-27-databricks-building-a-fhir-native-health-data-platform-on-databricks-lakebase — first wiki canonicalisation as a concept, instantiated by Moonlink in the FHIR-native health-data-platform shape.
- 2026-05-27 — sources/2026-05-27-redpanda-redpanda-sql-is-ga-the-query-engine-that-skips-the-pipeline — second canonical instance: streaming-broker variant via Redpanda SQL over Iceberg Topics; the substrate replaces the warehouse-side ingestion pipeline by running the analytical query engine in-cluster over a dual-tier-write substrate.
Caveats¶
- Positioning term. "Zero-ETL" is a vendor-favoured framing; underlying mechanism still has lag, failure modes, and consistency semantics that the customer must reason about during incidents.
- Not the same as HTAP. Zero-ETL spans two engines; HTAP unifies one engine over one substrate.
- Lag bounds depend on the replication mechanism. Without disclosed mechanism (as is the case with Moonlink as of 2026-05-27), the freshness SLO is not customer-knowable.