Skip to content

SYSTEM Cited by 1 source

Trino Gateway

Definition

Trino Gateway is an open-source proxy / load balancer in front of one or more Trino clusters. It gives clients a single connection URL for the whole fleet and routes each incoming query to the backend cluster best suited to execute it, based on routing rules that inspect the query body, the tables referenced, and/or HTTP headers from the client.

Repo: https://github.com/trinodb/trino-gateway.

Origin

Originated at Lyft as Presto Gateway, a proxy and load balancer for PrestoDB. It was later forked and integrated into the Trino ecosystem, with contributions from various organisations and the open-source community — Expedia being one named contributor in sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway.

Why it exists (architectural thesis)

As organisations scale their analytics platforms, they hit "increased query complexity, higher concurrency, and the need for specialised cluster configurations." Directing users to specific Trino cluster endpoints stops being practical as the user base grows. A gateway solves four problems simultaneously:

  1. Single connection URL for clients. Users, BI tools, and scheduled jobs target one URL; the gateway decides which backend cluster executes each query. Workload distribution is invisible to clients.
  2. Automatic routing to appropriate clusters. Routing rules match large-table queries to large-workload clusters, metadata-check queries (select version()) to a lightweight metadata cluster, BI-tool queries to BI-optimised clusters.
  3. No-downtime upgrades for backend clusters. Blue/green or canary swaps happen behind the gateway; the user-facing URL does not change.
  4. Transparent capacity changes. Add/remove backend clusters without interrupting users.

(Source: sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway)

Core components

RoutingManager

Decides which cluster each query goes to. Consults:

  • Cluster health — only routes to clusters in HEALTHY state (concepts/cluster-health-check); skips UNHEALTHY and PENDING.
  • Routing rules — each incoming query is passed through the rule set; rules emit a routingGroup that selects the cluster pool.

Routing rule language

Routing rules are short condition + action scripts evaluated per query. Canonical structure (from the post):

name: "large-table-query"
description: "Route queries for large tables"
actions:
  - |
    foreach (table : trinoQueryProperties.getTables()) {
      String tableSuffix = table.getSuffix();
      if (tableSuffix.contains("table1") || tableSuffix.contains("table2")) {
        result.put("routingGroup", "large-cluster");
        return;
      }
    }
condition: "true"

Key surfaces a rule can inspect:

  • trinoQueryProperties.getTables() — table references in the parsed query.
  • trinoQueryProperties.getBody() — raw query text (e.g. detect select version(), show catalogs).
  • request.getHeader("X-Trino-Source") — client application identifier (e.g. Tableau, Looker).

And writes:

  • result.put("routingGroup", "…") — set the routing-group name that maps to a backend cluster pool.

Three canonical rule shapes

  1. Large-table isolation — route queries against named large tables to a heavy-workload cluster so smaller queries aren't queued behind them.
  2. Metadata offload — BI tools frequently issue health-check queries (select version(), show catalogs); route these to a single-node metadata cluster so dashboard extract failures drop and user-level limits can be tuned independently.
  3. BI-source routing — detect Tableau / Looker queries via the X-Trino-Source header and route them to BI-optimised clusters.

Operator-UX features (Expedia's contributions, 2026-03-24)

  1. Routing rules UI (PR #433) — admins view and edit routing rules directly in the Gateway's UI. Before this, rules were managed by editing configuration files; inspection required reading files or examining the gateway environment. Rule changes persist when shared storage is configured.
  2. Source filter on history page (PR #551) — filter query history by the client application that originated the query (e.g. Tableau, Looker). Enables per-application debugging and pattern analysis without grepping history.
  3. Cluster health display (PR #601) — replaces the previous active/inactive toggle (which is operator intent, not cluster status) with three-state health:
  4. HEALTHY — health-checks report ready; RoutingManager routes requests to the cluster.
  5. UNHEALTHY — health-checks report not-ready; RoutingManager does not route.
  6. PENDING — cluster starting up; treated as unhealthy (no routing) until it crosses to HEALTHY.
  7. Full query text window (PR #740) — removes the previous 200-character truncation on query text and lets admins open the full query in a separate window. Cuts the need to jump to the originating cluster's UI to read a long query.

Seen in

Last updated · 200 distilled / 1,178 read