PATTERN Cited by 1 source
No-downtime cluster upgrade¶
Definition¶
No-downtime cluster upgrade is the deployment discipline of upgrading one or more clusters in a fleet without breaking the client-facing endpoint — clients see no connection failures, session drops, or connection-string changes during or after the upgrade. It depends structurally on single-endpoint abstraction: if clients connect directly to a specific cluster, any upgrade that replaces that cluster's identity is client-visible.
Two common shapes behind a gateway¶
Both shapes below assume a gateway (or equivalent routing tier) sits in front of the clusters and owns the client-facing endpoint.
Blue/green¶
- Spin up a green (new-version) cluster alongside the existing blue (old-version) cluster.
- Gateway routing rules / routing-group membership flip from blue to green atomically (or in a staged shift).
- If problems surface, flip back — blue is still warm.
- When green is proven, drain and tear down blue.
Canary¶
- Spin up a small canary cluster at the new version.
- Gateway routing rules send a small fraction of traffic to the canary (by user, by query shape, by time window).
- Ramp up fraction as confidence grows.
- When 100% traffic is on the canary shape, retire the old cluster.
Why a gateway makes this tractable¶
Without a gateway in front, doing either pattern requires client coordination — every client has to update its connection URL to point at the new cluster. At scale (thousands of scheduled jobs, BI dashboards, ad-hoc users, scripts in notebooks) this is an intractable migration; the cluster upgrade becomes a company-wide project rather than an SRE operation.
With a gateway:
- Clients keep their existing connection URL (the gateway URL).
- The gateway owns which backend cluster runs each query.
- Adding / draining / substituting backend clusters is a gateway config change, not a client change.
- The gateway's routing-rule engine (patterns/routing-rules-as-config) expresses the traffic shape during the transition.
This is why the post (sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway) lists "no-downtime upgrades for Trino clusters behind the gateway in a blue/green model or canary deployment model" as one of the four headline gateway advantages alongside single-URL, automatic routing, and transparent capacity changes.
Required infrastructure¶
- Gateway with healthy substitutability. Clusters must be substitutable behind the routing layer — requires identical data access, identical catalog connectivity, identical auth.
- Health-check integration. Three-state health (HEALTHY / UNHEALTHY / PENDING) — so the gateway doesn't route to a new cluster before it's ready, and automatically stops routing to a cluster that goes bad mid-upgrade.
- Query-level idempotency (or at least drain-safe disconnect) — a cluster being drained should not have active queries force-killed. Trino-level drain (stop accepting new, finish running) is the standard mechanism.
- Observable routing decisions. Operators need to see how traffic is actually being routed during the transition — per-query source, per-cluster load, error rates.
Related patterns¶
- patterns/blue-green-service-mesh-migration — AWS App Mesh → ECS Service Connect discontinuation uses the same blue/ green discipline but at the service-mesh layer.
- patterns/weighted-dns-traffic-shifting — Figma ECS → EKS cutover uses DNS-level blue/green between fleets; a weaker form than gateway-based shifting (TTL-governed lag).
- Kubernetes rolling update — handles in-cluster upgrades but not whole-cluster substitution.
Seen in¶
- sources/2026-03-24-expedia-operating-trino-at-scale-with-trino-gateway — named as a core capability of the Trino Gateway; blue/green and canary both listed.
Related¶
- concepts/single-endpoint-abstraction — the structural prerequisite.
- patterns/query-gateway — the realising pattern.
- concepts/cluster-health-check — the health-trichotomy that makes substitution safe.
- patterns/workload-segregated-clusters — a fleet of segregated clusters is the usual environment this pattern is applied in.
- patterns/blue-green-service-mesh-migration — adjacent blue/green pattern at a different layer.