Skip to content

SYSTEM Cited by 1 source

Meta Presto Gateway

Definition

Meta Presto Gateway is Meta's internal load-balancer / proxy tier sitting in front of every Presto cluster at Meta. It is the single routing plane for all Presto queries inside the company: "our Presto clusters sit behind load balancers which route every single Presto query at Meta" (sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale).

It is distinct from the open-source Trino Gateway (which originated at Lyft as Presto Gateway and was later integrated into the Trino ecosystem). The Meta-authored 2023 High Scalability post does not claim any shared lineage — Meta operates its own internal Gateway built for its scale and integration needs.

Role at Meta scale

  • Every Presto query traverses the Gateway.
  • Routing decisions consume multiple signals: current queueing state of downstream Presto clusters, "distribution of hardware across different datacenters," and "the data locality of the tables that the query uses." This is workload-aware routing extended with data-locality awareness (see also concepts/locality-aware-scheduling).
  • Gives clients a single endpoint abstraction over "tens of thousands of machines" spread over multiple regions.
  • Cluster lifecycle integration: new Presto clusters register with the Gateway to start receiving traffic; decommissioning Presto clusters deregister before draining (see patterns/automated-cluster-standup-decommission).

Robustness: throttling + autoscaling

Early in Meta's Presto scale-up the Gateway was a single point of failure. Named incident class: "one service unintentionally bombarding the Gateway with millions of queries in a short span, resulting in the Gateway processes crashing and unable to route any queries." Two defences were added:

  1. Throttling by dimension — the Gateway rejects queries under heavy load. The throttle knobs operate across multiple axes: "per user, per source, per IP, and also at a global level for all queries" — so a runaway batch job cannot starve an interactive dashboard user, and the global knob prevents total collapse.
  2. Gateway autoscaling"leaning on a Meta-wide service that supports scaling up and down of jobs, the number of Gateway instances are now dynamic." The Gateway tier scales out under load rather than maxing out CPU/mem on a fixed fleet, "thus preventing the crashing scenario described above."

Together, throttling + autoscaling make the Gateway robust against unintended DDoS-style internal traffic — a class of failure typical for internal shared-infrastructure gateways at hyperscale.

Distinction from Trino Gateway

Aspect Meta Presto Gateway Trino Gateway
Origin Meta-internal Lyft-origin (Presto Gateway), now Trino OSS
Engine PrestoDB Trino
Code base Proprietary Open source (trinodb/trino-gateway)
Routing signals Queue state, DC topology, data locality Routing rules on query body + headers + cluster health
Admission control Per-user/-source/-IP/-global throttling Health-based cluster selection
Elasticity Meta-wide autoscaling service Operator-managed

Both implement the query-gateway pattern and share the "single connection URL + route-per-query" shape; the specifics above diverge.

Seen in

Last updated · 319 distilled / 1,201 read