SYSTEM
VTGate¶
What it is¶
VTGate is the application-level query routing layer in a Vitess cluster. It is the MySQL-protocol-speaking proxy that applications connect to; VTGate parses the query, consults the topo-server to determine which VTTablets (and therefore which underlying MySQL instances) the query should route to, and proxies the query to the selected tablet(s).
From PlanetScale's own description:
"VTGate is an application-level query routing layer… VTGate determines available tablets and their roles via the topo server and reroutes traffic as needed." (Source: )
Role in the Vitess data path¶
VTGate sits at hop 3 of the five-hop PlanetScale MySQL data path:
application → edge load balancer → VTGate → VTTablet → MySQL
(hop 1) (hop 2) (hop 3) (hop 4) (hop 5)
(Source: , verbatim.)
VTGate's responsibilities:
- Query routing: parse MySQL protocol, decide which shard(s) / which role (primary vs replica) the query targets, based on the VSchema and VIndex definitions.
- Cross-shard query planning: for queries spanning multiple shards, generate a scatter-gather plan (see concepts/vtgate-query-planner, concepts/scatter-gather-query).
- Read/write splitting: route writes to primary tablets, reads to replica tablets by tablet role (which VTGate learns from the topo-server).
- Failover traffic redirection: when a tablet's role or health changes (VTTablet updates topo-server), VTGate "reroutes traffic as needed" without operator intervention.
What VTGate does NOT do¶
The division of labor against VTTablet is the load-bearing architectural split (see patterns/query-routing-proxy-with-health-aware-pool):
- VTGate does not hold MySQL connections. The back-end MySQL connection pool lives in VTTablet.
- VTGate does not directly health-check MySQL instances. It consumes the health state that VTTablet publishes to the topo-server.
- VTGate is effectively stateless (inferred from the data path description — it reads topo-server on each routing decision; no durable state of its own that outlives a process restart). Multiple VTGate replicas can be run behind the edge LB without coordination, which is the enabling property for horizontal scale-out of the proxy tier itself.
Why that design¶
Keeping VTGate stateless + routing-only is what makes the PlanetScale MySQL platform's connection-scaling claims work: an application can open a very large number of client connections against VTGate, because VTGate doesn't need to hold a back-end MySQL connection per inbound client connection — VTTablet does connection pooling + queueing behind it. This is the mechanism Liz van Dijk's 1M-connection benchmark relies on ().
Seen in¶
- — Brian Morrison II (2023-08-23) pedagogy-101 restatement: "The
vtgateis responsible for accepting queries and routing them to the proper tablet, breaking up the query, and dispatching it to multiple tablets if needed." Names VTGate as one of three Vitess primitives alongside VTTablet and the newly-namedvtctld. - — canonical architectural description (5-hop data path, VTGate role split vs VTTablet); co-canonical with the sibling Aurora post.
- — identical architecture paragraph against a different competitor.
- — earlier 2021 positioning post; VTGate / VTTablet named but not architecturally specified.
- — VTGate as the site of aggregation pushdown + local-global decomposition.
- — VTGate's query planner bug retrospective.
- — the benchmark that relies on VTGate's stateless-routing + VTTablet-pool decoupling.
- — Jonah Berquist's 1M-QPS sysbench-tpcc benchmark on a 40-shard Vitess cluster uses VTGate as the source of the canonical p50 + p99 latency metrics. Load-bearing observation: the saturation signal "increase in latency as we max out our throughput... particularly evident in our p99 latency" is measured from VTGate's client-facing query path, not from the backend MySQL shards. This is because VTGate sees the routing + pooling latency plus the per-shard server-time; backend-side instrumentation would miss the VTTablet pool-queue wait. Canonical wiki framing: VTGate is the right observability point for shard-pool-saturation signals because it sits at the aggregation boundary where per-request latency distribution reflects the whole-cluster saturation state, not any single shard's.
- — connection pool architecture details.
- — Brian Morrison II's 2022-10-21 pedagogy-101 altitude disclosure: "a lightweight proxy, known as VTGate, to intelligently route queries to the proper MySQL instance" + "VTGate understands the MySQL protocol and performs that intelligent query routing" + "Every client (GUI, application, etc) that connects to a Vitess instance establishes a lightweight connection to the VTGate instead of MySQL directly". Names the Go + gRPC implementation substrate ("the various Vitess components are written with Go and internally communicate with one another over gRPC") and "thousands of clients simultaneously" as the concurrency ballpark — three orders of magnitude lower than the canonical ceiling that measures this properly. Canonical beginner on-ramp.
- — VTGate as the primary autoscaling risk surface in the 2025-10-20 AWS us-east-1 incident and the primary target of the conservative bin-packing intervention. Three wiki-canonical framings: (1) VTGate as the stateless elastic tier of a Vitess cluster — "MySQL primaries don't scale-out; vtgate is the elastic tier" — which is why it's the component that autoscales, not the tablets. (2) Diurnal-autoscaling risk concentrates on VTGate: customers using diurnal autoscaling to ramp VTGate before US-East Monday peak faced "less than half the vtgate capacity they had the week prior" during the EC2 launch failure window. (3) Operator response was a scheduling-side change: "we bin-packed vtgate processes more tightly than usual, running closer to CPU capacity than is typical, in order to provide ample capacity for the US work day" — canonicalised as "the most important intervention" of the phase-2 playbook. (4) A small number of VTGate processes didn't self-recover after the partial-partition window healed and needed manual restart — canonical post-partition stuck-connection example distinct from split-brain.
- — Rafer Hazen (2024-08-14) canonicalises VTGate as the aggregation layer for per-query index-usage telemetry. Mechanism: InnoDB's handler hook records the per-query used-index set during execution → the set is bolted onto the MySQL wire-protocol response packet returned to VTGate → VTGate coalesces by query-pattern fingerprint in memory and ships to the Insights pipeline every 15 seconds. Verbatim: "With the per-query index information in VTGate, we aggregate index usage information per query-pattern and send it into the Insights pipeline every 15 seconds. This approach allows us to aggregate the time series count of indexes used for 100% of queries with negligible overhead in MySQL." Canonical wiki framing: VTGate is the right aggregation point for per-pattern telemetry because (a) it's stateless + easy to scale, so the coalescing overhead lives away from MySQL, (b) it already sees 100% of query traffic (no sampling), (c) it already holds the pattern-fingerprint canonicalisation used for other Insights surfaces. This extends VTGate's canonicalised telemetry role beyond the 2021 query-statistics primitive and the storing-time-series post into the per-query index-usage axis. See patterns/handler-hook-sidecar-telemetry for the generalised mechanism.
Related¶
- systems/vttablet — the tablet-side middleware behind VTGate.
- systems/vitess — parent system.
- concepts/vitess-topo-server — shared state VTGate reads for routing decisions.
- concepts/vtgate-query-planner — VTGate's internal query planner.
-
patterns/query-routing-proxy-with-health-aware-pool — the architectural shape VTGate embodies.
-
(Andrés Taylor, PlanetScale / Vitess core, 2023-06-01) canonicalises VTGate as host of the Gen4 query-planner rewrite (old monolithic model → new step-by-step runnable- plan pipeline). Defines VTGate's planner job verbatim: "It accepts queries from users and plans how to spread the query across multiple shards and/or keyspaces. The leaf level of the VTGate query plans are routes, which are operators that will send a query to one or more shards." Pushdown discipline ("The aim is always to push as much as possible down to the much faster MySQL process") also reduces MySQL-compat risk: "This also reduces the risk of compatibility differences between Vitess and plain MySQL, since MySQL is doing most of the work." The post canonicalises three new VTGate internals on the wiki: Horizon operator (placeholder for deferred post-FROM planning), runnable-plan invariant (every optimisation step emits an inspectable runnable plan), Offset Planning (post-fixed-point pipeline stage). VTGate's role as the host of the full pipeline (Parse → Join Order → Horizon Planning → Offset Planning → Executable Plan) + emitter of the final operator tree for the execution engine. The worked example final plan is a two-query shape — scatter on LHS (
SELECT u.foo, u.uid, u.baz, weight_string(u.baz) FROM user AS u ORDER BY u.baz ASC) + per-row RHS (SELECT ue.bar FROM user_extra AS ue WHERE ue.uid = :u_uid) — with VTGate acting as the nested-loop-join coordinator. Canonical source for patterns/runnable-plan-pipeline. -
— Savannah Longoria (PlanetScale, 2022-12-14) restates VTGate's role at pedagogy-101 altitude in the Temporal-on- PlanetScale context: "In the case of sharding, the VTGate layer transparently routes queries to the necessary shards." No new VTGate internals; the post's VTGate relevance is that it is what makes the two-keyspace Temporal split application-invisible — the application sees one SQL surface, VTGate decides which keyspace (sharded or unsharded) each query targets based on the VSchemas.
Seen in — FK-aware cascade orchestration¶
- — canonical wiki disclosure of VTGate as the owner of
foreign-key cascade
semantics. Shlomi Noach + Manan Gupta (2023-12-05)
canonicalise: VTGate must not delegate FK cascades to
the backing MySQL, because InnoDB applies cascaded
child-row changes inside the storage engine without
writing them to the binlog (see
concepts/innodb-silent-cascade-in-binlog) —
downstream CDC + replication consumers miss the events.
Instead, VTGate plans explicit
cascade orchestration:
SELECT ... FOR UPDATEthe affected parent rows,DELETEthe child rows first (with recursion for grandchildren), thenDELETEthe parent. ForON UPDATE CASCADE, VTGate disablesFOREIGN_KEY_CHECKSand re-validates grandchildRESTRICTs in application code. Cost: more locking, more VTGate↔MySQL round-trips, "non-zero performance impact". Scope: single-shard / unsharded only as of 2023-12-05; cross-shard is roadmap and is partially motivated by the Vitess-owns-FK design stance. See concepts/vitess-foreign-key-enforcement for the full per-action breakdown. The fuzzer surface at the VTGate planner altitude grows to include DML-level FK compatibility tests against standalone MySQL — complementing the planner-altitude fuzzer from .