SYSTEM Cited by 1 source

Meta Presto Gateway¶

Definition¶

Meta Presto Gateway is Meta's internal load-balancer / proxy tier sitting in front of every Presto cluster at Meta. It is the single routing plane for all Presto queries inside the company: "our Presto clusters sit behind load balancers which route every single Presto query at Meta" (sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale).

It is distinct from the open-source Trino Gateway (which originated at Lyft as Presto Gateway and was later integrated into the Trino ecosystem). The Meta-authored 2023 High Scalability post does not claim any shared lineage — Meta operates its own internal Gateway built for its scale and integration needs.

Role at Meta scale¶

Every Presto query traverses the Gateway.
Routing decisions consume multiple signals: current queueing state of downstream Presto clusters, "distribution of hardware across different datacenters," and "the data locality of the tables that the query uses." This is workload-aware routing extended with data-locality awareness (see also concepts/locality-aware-scheduling).
Gives clients a single endpoint abstraction over "tens of thousands of machines" spread over multiple regions.
Cluster lifecycle integration: new Presto clusters register with the Gateway to start receiving traffic; decommissioning Presto clusters deregister before draining (see patterns/automated-cluster-standup-decommission).

Robustness: throttling + autoscaling¶

Early in Meta's Presto scale-up the Gateway was a single point of failure. Named incident class: "one service unintentionally bombarding the Gateway with millions of queries in a short span, resulting in the Gateway processes crashing and unable to route any queries." Two defences were added:

Throttling by dimension — the Gateway rejects queries under heavy load. The throttle knobs operate across multiple axes: "per user, per source, per IP, and also at a global level for all queries" — so a runaway batch job cannot starve an interactive dashboard user, and the global knob prevents total collapse.
Gateway autoscaling — "leaning on a Meta-wide service that supports scaling up and down of jobs, the number of Gateway instances are now dynamic." The Gateway tier scales out under load rather than maxing out CPU/mem on a fixed fleet, "thus preventing the crashing scenario described above."

Together, throttling + autoscaling make the Gateway robust against unintended DDoS-style internal traffic — a class of failure typical for internal shared-infrastructure gateways at hyperscale.

Distinction from Trino Gateway¶

Aspect	Meta Presto Gateway	Trino Gateway
Origin	Meta-internal	Lyft-origin (Presto Gateway), now Trino OSS
Engine	PrestoDB	Trino
Code base	Proprietary	Open source (trinodb/trino-gateway)
Routing signals	Queue state, DC topology, data locality	Routing rules on query body + headers + cluster health
Admission control	Per-user/-source/-IP/-global throttling	Health-based cluster selection
Elasticity	Meta-wide autoscaling service	Operator-managed

Both implement the query-gateway pattern and share the "single connection URL + route-per-query" shape; the specifics above diverge.

Seen in¶

sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale — primary source for this page. Named scale: "tens of thousands of machines" behind the Gateway; the Gateway routes every Presto query at Meta. Robustness story: throttling + autoscaling after initial outage class.

systems/presto — the query engine being fronted.
systems/meta-data-warehouse — the data platform this Gateway serves.
systems/trino-gateway — the open-source cousin with a related but distinct lineage.
patterns/query-gateway — the general architectural pattern.
patterns/gateway-throttling-by-dimension — the admission-control pattern applied here.
patterns/gateway-autoscaling — the elasticity pattern applied here.
concepts/workload-aware-routing — the routing discipline.
companies/meta