SYSTEM Cited by 2 sources
Trino¶
Definition¶
Trino is an open-source distributed SQL query engine — "a fork of PrestoSQL" (sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway) — that executes federated SQL queries over heterogeneous data sources (object stores, relational DBs, Kafka, Cassandra, etc.) without requiring data relocation. It is the post-Presto-schism successor of the original Presto engine and one of the canonical query engines for lakehouse-style architectures over Apache Iceberg / Hive / Delta Lake tables on object storage.
Role in this wiki¶
Trino appears as the distributed SQL engine companies put in front of their data lake. Typical deployment shape:
- One or more Trino coordinators plan queries; many workers execute them; a discovery service tracks cluster membership.
- Connectors abstract each data source; a single query can join across Iceberg / Hive / Kafka / Postgres in one plan.
- At organizational scale, a fleet of Trino clusters is operated — typically segregated by workload shape (patterns/workload-segregated-clusters) — rather than one big cluster handling every workload.
Deployment pattern at scale (Expedia)¶
Expedia runs multiple Trino clusters categorized by workload shape:
- Adhoc clusters — mixed workloads, medium concurrency; for exploratory analysis and development.
- ETL clusters — high-volume, high-complexity queries, low concurrency; heavy data processing.
- BI clusters — low-complexity queries, high concurrency; dashboards behind Tableau / Looker.
Each cluster's config is tuned to the shape of its workload; users do not address clusters directly — they point at a Trino Gateway which routes each query to the appropriate cluster based on routing rules.
Routing / observability context on the gateway side¶
Trino exposes a handful of query-text properties the gateway can inspect:
trinoQueryProperties.getTables()— tables referenced in the query (used for table-based routing to "large-table" clusters).trinoQueryProperties.getBody()— raw query text (used for metadata-query detection likeselect version()).X-Trino-SourceHTTP header — identifies the client application (Tableau, Looker, etc.); drives BI-source routing.
(Source: sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway)
Seen in¶
- sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway — Expedia on running a multi-cluster Trino fleet behind a Trino Gateway with workload-aware routing; Adhoc / ETL / BI cluster segregation; UI contributions for routing-rule management, query history, and cluster health.
- sources/2026-05-28-cloudflare-how-we-built-cloudflares-data-platform-and-an-ai-agent-on-top-of-it — Trino is the query engine of Cloudflare Town Lake, federating SQL across Postgres, ClickHouse, BigQuery, and Iceberg on R2 in a single plan. Canonical worked example: "a query that asks 'what are the top 100 paying customers by Workers requests this week' compiles into a plan that pushes filters into ClickHouse, joins against an account dimension in Postgres, and ranks against billing rollups in R2, all in one go." Town Lake's governance integrates at the engine level: Lifeguard renders a per-user JSON policy that Trino reads over HTTP, controlling table allowlist + column-level masking + opt-in PII redaction. Canonicalises patterns/single-sql-interface-over-heterogeneous-sources as the engine-level pattern. R2 SQL is the named successor Cloudflare plans to migrate parts of Town Lake's workflow to as it matures — the post is explicit that Trino is today's engine, not the permanent choice.
Related¶
- systems/presto — the predecessor engine Trino forked from.
- systems/trino-gateway — the proxy / load balancer in front of a Trino fleet.
- systems/apache-iceberg — a canonical table format Trino queries.
- systems/apache-hive — the legacy catalog protocol (Hive Metastore) Trino commonly federates with.
- systems/amazon-athena — AWS's managed serverless Presto/Trino offering.
- systems/cloudflare-town-lake — second canonical wiki instance: Trino as the federated SQL engine of Cloudflare's data platform.
- concepts/workload-aware-routing — the architectural pattern for routing SQL queries based on their shape, realised in Trino fleets via Trino Gateway.
- concepts/data-lakehouse — the architectural class Trino is the canonical query engine for.
- patterns/single-sql-interface-over-heterogeneous-sources — the federated-SQL pattern Trino is the canonical engine of.