Skip to content

PLANETSCALE 2024-04-17 Tier 3

Read original ↗

PlanetScale — Introducing global replica credentials

Summary

Matt Robenolt and Iheanyi Ekechukwu (PlanetScale, 2024-04-17) launch global replica credentials — a single MySQL username/password pair that automatically routes reads to the lowest-latency read replica in any PlanetScale read-only region, swapping out the previous model where each read-only region had its own credential. The launch doubles as the canonical public disclosure of the PlanetScale Global Network — a CDN-like database-connectivity layer that has been quietly built out "for the past few years," responsible for terminating every MySQL connection globally at the edge closest to the client, pooling / multiplexing / TLS-terminating those connections, and tunneling them back to origin clusters over a small set of long-held warm connections. Adding or removing a read-only region mutates a per-credential Route record watched via etcd; edge nodes continuously measure inter-region latency via mesh health-checking and keep the route's cluster list sorted by measured latency, so the "next hop" is always clusters[0]. The post canonicalises the three-part internal data model — Credential (authoritative, lives inside the DB cluster region) + Route (global, username → cluster-list mapping) + Endpoint (hostname, either {region}- specific or latency-DNS-resolved {provider}.connect.psdb.cloud).

Key takeaways

  • Global replica credentials swap N per-region credentials for 1 global credential. "If you have existing replica credentials for each of your read-only regions, you can now swap them out for a single global replica credential." The credential's Route carries the full cluster list (cluster=["us-east-1", "us-west-2"] for a multi-region replica setup) rather than a single cluster for a single region. The routing decision moves from "application picks a region-specific hostname" to "network picks the lowest-latency region for you, per-query, without reconnection." Canonical new pattern: patterns/global-credential-over-per-region-credentials.

  • PlanetScale Global Network is architected like a CDN, but terminates MySQL connections at the edge rather than HTTP requests. "For the past few years, we have been quietly building and architecting a database connectivity layer structured like a CDN, this layer we call the PlanetScale Global Network." Canonical new system: systems/planetscale-global-network. Responsibilities (quoted verbatim): "Terminating every MySQL connection globally. Handling our RPC API (to support things like database-js). Connection pooling at a near infinite scale. TLS termination. Routing connections to your database."

  • Edge termination reduces TCP + TLS + MySQL handshake cost by moving the chatter close to the client. "we fully terminate the MySQL connection and TLS handshake at the outermost layer, closest to your application. And connection pooling happens here, similar to running an instance of ProxySQL in your datacenter. From this outer layer, connections are able to be multiplexed and tunneled back to your origin database over a small number of long held, encrypted connections that are already warmed up and ready to go." Canonical new concept: concepts/mysql-connection-termination-at-edge — PlanetScale's database-layer analogue of the HTTP CDN's edge-TLS-termination primitive. Structurally identical to HTTP-CDN edge termination, differs only in what's being terminated: MySQL protocol + connection-pooling instead of HTTP + request forwarding.

  • The internal data model is a three-part split: Credential + Route + Endpoint. Verbatim: "A Credential to us is broken up into three pieces, the Credential, a Route, and an Endpoint. The Route is shared to every geographic region at our edge, and the Credential remains inside your unique database cluster region. While the Endpoint is the hostname you use to connect to us, typically something like aws.connect.psdb.cloud or gcp.connect.psdb.cloud." Canonical new concept: concepts/credential-route-endpoint-triple. Protobuf shapes disclosed verbatim:

    message Route {
      string branch = 1;
      repeated string cluster = 2;
      fixed64 expiration = 3;
      ...
    }
    
    enum TabletType {
      TABLET_TYPE_UNSPECIFIED = 0;
      TABLET_TYPE_PRIMARY = 1;
      TABLET_TYPE_REPLICA = 2;
      TABLET_TYPE_RDONLY = 3;
    }
    
    message Credential {
      string branch = 1;
      bytes password_sha256 = 2;
      psdb.data.v1.Role role = 3;
      fixed64 expiration = 4;
      TabletType tablet_type = 5;
    }
    

Critical framing: "this Route contains no authentication information and is not authoritative for auth, it is effectively a mapping of username to a list of clusters." The Route is replicated globally (safe — no secrets); the Credential is the authoritative auth record and lives only inside the origin DB cluster's region. Classic control/data plane split at the authz layer, applied by where the record physically lives (global vs local region) rather than by control-channel semantics.

  • Route / Credential are stored in etcd with near-realtime watch. "The Route and Credential are stored in etcd which we are able to watch for changes in near realtime and respond to mutations, or deletions as soon as they happen." Canonical new pattern: patterns/etcd-watched-route-mutation — the control-plane write is a single etcd record mutation; edge nodes subscribe and apply the new cluster list for that credential's Route as soon as the watch fires. This is the propagation mechanism behind adding or removing a read-only region without reconnection: "Similarly, when read-only regions are added and removed, we only need to mutate this Route with a new set of what regions your database is in, and we just maintain a sorted list ready to go."

  • Two endpoint shapes: "Direct" per-region vs "Optimized" latency-DNS. Verbatim: "Direct has the form of {region}.connect.psdb.cloud" — edge node in that specific region, for when the client is next door to the DB cluster. "The Optimized endpoint is backed by a latency-based DNS resolver. In AWS, for example, this is their Route53 latency-based routing policy. Which is most of the magic to resolve aws.connect.psdb.cloud to the nearest edge region to you." Canonical new concept: concepts/latency-based-dns-routing Route 53's latency-routing-policy as the DNS-layer half of the CDN shape, complementing the edge-termination half. "This means whether you're connecting from your local machine with pscale connect or from the datacenter next to your database, you get routed through the closest region to you, which gives us the CDN effect."

  • Inter-region latency is measured continuously via mesh health-checking; Routes are kept sorted by that measurement. "we maintain warm connections between all of our regions ready to go, we utilize these to measure latency continuously as a part of regular health checking. So, for example, the us-east-1 edge node is continuously pinging its peers, similar to a mesh network and measuring their latency. Once a Route is seen over the etcd watcher, before it's accessible to being used, we are able to simply sort the list of clusters based on their latency times we already are tracking. We periodically re-sort every Route if/when latency values change. This keeps the 'next hop' decision always clusters[0] in practice." Canonical new concept: concepts/mesh-latency-health-check — peering-style continuous-measurement + sorted-preference data-structure for converting raw latency telemetry into a routing decision that composes cleanly with a mutation-watched cluster list.

  • Per-query routing without reconnection. "because the connection is already established with us during all of this, the Route is utilized on a per-query basis, thus without needing to reconnect or anything, we can route you to the lowest latency next hop in realtime." The caller's MySQL session stays open at the edge; per-query, the edge picks clusters[0] via the warmed mesh connection. When the sorted order changes (peer latency shift, region added/removed), the next query picks the new clusters[0] automatically — no client-visible reconnection, no SDK retry logic, no exponential backoff. Structural composition of the four new concepts: edge termination (where the session terminates) + credential-route-endpoint split (what gets replicated vs what stays local) + etcd-watched Route (how cluster-list changes propagate) + mesh latency (how clusters[0] is picked) = per-query lowest-latency routing with zero caller churn.

  • Failover as an emergent property. "In the event if a hard failure (if for some reason this entire region were down), we could go over to the next option if there were multiple choices." When clusters[0] fails hard, the edge falls through to clusters[1]. No separate failover protocol — the sorted cluster-list structure that handles latency-sensitivity also handles unreachability, since an unreachable peer reports infinite latency via the mesh health-check and drops to the bottom of the sorted list naturally.

  • Canonical new pattern: the CDN-like database-connectivity layer. patterns/cdn-like-database-connectivity-layer — the composition of edge termination + latency-DNS + control- plane-watched routing table + mesh latency + warmed multiplexed backhaul, applied to database wire protocols rather than HTTP. Architectural template applicable to any stateful-protocol edge-termination service (Postgres, Redis, Cassandra, gRPC-long-session, MQTT), not MySQL-specific. Sibling of patterns/caching-proxy-tier (centralised connection multiplexing at the origin) at the opposite topology: this pattern distributes the multiplexer globally and puts it between the client and the pool.

Systems / concepts / patterns extracted

New systems

New concepts

New patterns

Extended pages

  • systems/planetscale — new Seen-in entry framing the Global Network as the connection-admission substrate for every PlanetScale MySQL product; complements the pool/credential story from the 2023-03-27 Gangal vttablet connection-pool post + the 2023-01-04 Robenolt HTTP/3 post.
  • systems/planetscale-connect — Global-Network context added (Connect's gRPC API is one of the protocols terminated at the edge alongside MySQL).
  • systems/mysql — new Seen-in entry for edge-termination of the MySQL wire protocol, distinct from in-DC ProxySQL usage.
  • systems/vitess — frontmatter sources appended (TabletType enum is Vitess-derived).
  • systems/etcd — new Seen-in entry canonicalising the watched-record mutation pattern at a CDN-control-plane altitude (complements existing Kubernetes + Figma stateful-workload framings).
  • systems/amazon-route53 — new Seen-in entry for latency-routing-policy use as the CDN-effect DNS half.
  • concepts/connection-multiplexing — new Seen-in entry: Global-Network edge is a globally-distributed multiplexer tier, sibling of Figma's systems/figcache single-DC instance.
  • concepts/edge-to-origin-database-latency — new Seen-in entry: the Global Network is PlanetScale's canonical mitigation (edge handshake + multiplexed tunnel), complementing the Cloudflare/Hyperdrive framings.
  • concepts/anycast — new Seen-in entry framing PlanetScale's latency-DNS-to-POP shape as an explicit alternative to anycast that achieves the same CDN effect through DNS rather than BGP.
  • concepts/read-write-splitting — new Seen-in entry framing global replica credentials as the "network-side" automation of read-side routing, removing the per-region hostname from the application's routing concern.
  • companies/planetscale — Recent-articles entry prepended; frontmatter sources + tags + related extended.

Caveats

  • No production numbers. No latency deltas (edge termination vs origin termination), no per-region RTT distribution, no replica-region count, no claim about what fraction of customers are on global vs per-region credentials.
  • "Next hop" failover semantics not fully specified. The post says "we could go over to the next option if there were multiple choices" but doesn't disclose the unreachability detection timeout, retry policy, or caller-visible error semantics when all clusters in the Route are down.
  • Read-your-writes not discussed. Because the global credential routes reads to any replica, a write followed by a read on the same MySQL session may hit different regions with different replication lag. The post elides this entirely — global replica credentials are marketed to "utilize this extra compute", not for RYW-critical paths.
  • Replication-lag-visible staleness inherent. Replicas are always eventually consistent behind the primary; global replica credentials don't change the lag semantics, only the which replica question.
  • Mesh latency measurement cost not quantified. Continuous pinging between all edge-node pairs scales O(N²) in edge-node count; no disclosure of cadence, payload size, or bandwidth cost.
  • Endpoint hostnames are AWS / GCP specific. The canonical Optimized endpoints shown are aws.connect.psdb.cloud / gcp.connect.psdb.cloud; Azure or cross-cloud semantics not discussed.
  • Control-plane fault handling elided. etcd outage / watch disconnection / stale-cache semantics not specified.
  • pscale connect CLI workflow named but not deep-dived. The pscale shell --replica command is mentioned as the try-it affordance; its interaction with the Global Network (direct vs Optimized endpoint selection, CLI-side config) not disclosed.
  • URL-slug authority. Raw filename slug is introducing-global-replica-credentials (same as URL slug — no truncation in this case). URL field verbatim from raw file.

Cross-source continuity

  • Canonical companion to the 2023-03-27 Gangal Connection pooling in Vitess post. That post canonicalises the in-region connection-pool substrate (vttablet, lockless pool, settings pool); this post canonicalises the global connection-admission substrate (Global Network, edge termination, credential/route/endpoint split). Together they bracket PlanetScale's connection-handling architecture across two altitudes: client → edge (this post) and edge → DB cluster (Gangal's post).
  • Canonical companion to the 2023-01-04 Robenolt Faster MySQL with HTTP/3 post. Same author (Matt Robenolt) on the same architectural theme (connection setup cost, edge termination, multiplexing); HTTP/3 post makes the client-library case for multiplexing over HTTP/QUIC to the public gRPC API, this post makes the network-architecture case for the edge-terminated MySQL-protocol path. Both are manifestations of the Global Network; this post discloses the Global Network itself.
  • Complements sources/2026-04-16-cloudflare-deploy-postgres-and-mysql-databases-with-planetscale-workers
  • sources/2026-04-21-planetscale-faster-planetscale-postgres-connections-with-cloudflare-hyperdrive — those posts canonicalise concepts/edge-to-origin-database-latency mitigation through Cloudflare's Worker-side proxy (Hyperdrive); this post shows PlanetScale's own-network-side answer (edge termination in PlanetScale's POPs). Two architectural answers to the same problem at different network layers.
  • Complements sources/2026-04-21-planetscale-comparing-awss-rds-and-planetscale — Jarod Reyes' 2021 post named PlanetScale's "connect to any region" capability without specifying the mechanism; this post is the canonical mechanism disclosure ~3 years later.
  • No existing-claim contradictions — strictly additive.

Source

Last updated · 378 distilled / 1,213 read