SYSTEM Cited by 1 source

Trino¶

Definition¶

Trino is an open-source distributed SQL query engine — "a fork of PrestoSQL" (sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway) — that executes federated SQL queries over heterogeneous data sources (object stores, relational DBs, Kafka, Cassandra, etc.) without requiring data relocation. It is the post-Presto-schism successor of the original Presto engine and one of the canonical query engines for lakehouse-style architectures over Apache Iceberg / Hive / Delta Lake tables on object storage.

Role in this wiki¶

Trino appears as the distributed SQL engine companies put in front of their data lake. Typical deployment shape:

One or more Trino coordinators plan queries; many workers execute them; a discovery service tracks cluster membership.
Connectors abstract each data source; a single query can join across Iceberg / Hive / Kafka / Postgres in one plan.
At organizational scale, a fleet of Trino clusters is operated — typically segregated by workload shape (patterns/workload-segregated-clusters) — rather than one big cluster handling every workload.

Deployment pattern at scale (Expedia)¶

Expedia runs multiple Trino clusters categorized by workload shape:

Adhoc clusters — mixed workloads, medium concurrency; for exploratory analysis and development.
ETL clusters — high-volume, high-complexity queries, low concurrency; heavy data processing.
BI clusters — low-complexity queries, high concurrency; dashboards behind Tableau / Looker.

Each cluster's config is tuned to the shape of its workload; users do not address clusters directly — they point at a Trino Gateway which routes each query to the appropriate cluster based on routing rules.

Routing / observability context on the gateway side¶

Trino exposes a handful of query-text properties the gateway can inspect:

trinoQueryProperties.getTables() — tables referenced in the query (used for table-based routing to "large-table" clusters).
trinoQueryProperties.getBody() — raw query text (used for metadata-query detection like select version()).
X-Trino-Source HTTP header — identifies the client application (Tableau, Looker, etc.); drives BI-source routing.

(Source: sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway)

Seen in¶

sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway — Expedia on running a multi-cluster Trino fleet behind a Trino Gateway with workload-aware routing; Adhoc / ETL / BI cluster segregation; UI contributions for routing-rule management, query history, and cluster health.

systems/presto — the predecessor engine Trino forked from.
systems/trino-gateway — the proxy / load balancer in front of a Trino fleet.
systems/apache-iceberg — a canonical table format Trino queries.
systems/apache-hive — the legacy catalog protocol (Hive Metastore) Trino commonly federates with.
systems/amazon-athena — AWS's managed serverless Presto/Trino offering.
concepts/workload-aware-routing — the architectural pattern for routing SQL queries based on their shape, realised in Trino fleets via Trino Gateway.