PATTERN Cited by 1 source
Single SQL interface over heterogeneous sources¶
A data-platform pattern in which a single SQL query engine exposes a unified query interface over multiple heterogeneous storage tiers (object-store-backed Iceberg tables, OLTP Postgres, columnar OLAP ClickHouse, BigQuery rollups, Kafka topics, etc.) — joins span sources, predicates push down into each source, and no intermediate materialisation is required between source-tier filtering and cross-tier joining.
Apache Trino is the canonical engine for this pattern; Cloudflare Town Lake is the canonical wiki instance.
The core property: predicate pushdown + cross-source join, one plan¶
From the Town Lake launch post:
"We chose Apache Trino for that: a single SQL query can join a Postgres table, a ClickHouse table, and an Iceberg table on R2 without a need to materialize the intermediate results into a different system. A query that asks 'what are the top 100 paying customers by Workers requests this week' compiles into a plan that pushes filters into ClickHouse, joins against an account dimension in Postgres, and ranks against billing rollups in R2, all in one go."
The query plan's structural shape:
SELECT TOP 100 ... FROM
(filter pushed into ClickHouse) AS req_counts
JOIN
(account dimension served by Postgres) AS dim
ON ...
JOIN
(billing rollup ranked from R2/Iceberg) AS bill
ON ...
ORDER BY bill.spend DESC
LIMIT 100
What makes this powerful:
- Filter pushdown to each source — ClickHouse does the heavy filtering, Postgres does the dimension lookup, Iceberg-on-R2 does the analytics rollup. No source is asked to scan more than it should.
- Joins span sources — the engine pulls filtered rows from each, joins in memory.
- No intermediate materialisation — there's no ETL job pre-joining the three sources into a fourth table. The cross-source join happens at query time, in the engine.
Why this beats source-side materialisation¶
The traditional alternative is to pre-join heterogeneous sources via ETL into a single warehouse table:
Postgres ────► ETL ────►
ClickHouse ──► ETL ────► unified_table ────► query
R2 ──────────► ETL ────►
Costs:
- Storage duplication — the same data lives in source + warehouse.
- ETL maintenance burden — every join shape needs its own pipeline.
- Staleness — the warehouse is some interval behind the sources.
- Schema drift — when a source table changes, ETL pipelines break silently.
Federated SQL eliminates all four costs at the price of:
- Query latency — joining at query time can be slower than reading a pre-joined table for some shapes.
- Engine complexity — the federation layer must understand predicate pushdown for each source connector.
- Cross-source consistency — the engine reads each source at a slightly different time, so joins are not perfectly point-in-time.
The Town Lake post implicitly accepts the latency trade — "top 100 customers by revenue" lands in "about three seconds in Skipper" — the consistency trade is acceptable for analytics (not for transactions).
Connector ecosystem is the load-bearing investment¶
What makes this pattern operational is the set of connectors the engine ships with:
- Postgres — OLTP source.
- MySQL — OLTP source.
- ClickHouse — analytics events.
- BigQuery — Google-cloud-resident rollups.
- Iceberg / Hudi / Delta — open-table-format on object storage.
- Kafka — streaming source.
- S3 / GCS / R2 — raw object access.
Trino's investment in the connector ecosystem is what makes "single SQL plane" practical at scale.
Composes with default-closed governance¶
The federated-SQL shape composes with the default-closed table allowlist in Town Lake: the allowlist + per-column policy is applied at the engine layer (via Lifeguard feeding Trino), so governance is consistent across all federated sources without each source needing its own allowlist enforcement.
Sibling pattern at Databricks¶
patterns/foreign-catalog-federation-for-operational-db-governance is the Databricks framing — Unity Catalog federates governance across foreign catalogs (Postgres, MySQL, etc.) so a single governance plane covers operational + analytical sources. The shape is the same; the language differs ("unified governance plane" at Databricks vs "single SQL interface" at Cloudflare).
When this pattern doesn't fit¶
- Real-time / sub-100ms latency — federated joins have per-source planning overhead; sub-100ms typically requires a single source or a pre-materialised view.
- Massive cross-source joins where one side dominates — if ClickHouse needs to send 100GB to Postgres for the join, the federation cost is unbearable; pre-materialise.
- Strict point-in-time consistency — federated reads land at slightly different timestamps; transactional consistency requires single-source-of-truth.
Seen in¶
- sources/2026-05-28-cloudflare-how-we-built-cloudflares-data-platform-and-an-ai-agent-on-top-of-it — canonical wiki instance. Town Lake on Trino.
- sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway — Expedia's multi-cluster Trino fleet (sibling instance of the pattern at the engine level; different platform shape).
Related¶
- systems/cloudflare-town-lake — canonical platform.
- systems/trino — canonical engine.
- systems/cloudflare-r2-data-catalog / systems/postgresql / systems/clickhouse / systems/google-bigquery — the federated sources in Town Lake.
- systems/apache-iceberg — the table format on the lake side.
- concepts/data-lakehouse — the architectural class.
- concepts/compute-storage-separation — the structural property federation depends on.
- patterns/foreign-catalog-federation-for-operational-db-governance — sibling pattern at Databricks.