PLANETSCALE 2022-12-14 Tier 3

PlanetScale — Temporal Workflows at scale: Part 2 — Sharding in production¶

Summary¶

Part 2 of Savannah Longoria's two-part PlanetScale × Temporal pairing tutorial (2022-12-14). Where Part 1 framed Temporal's cluster + persistence-layer model and the SQL-vs-NoSQL trade-off, Part 2 concretises the PlanetScale-as-Temporal-backing-store story with a worked production VSchema + the shard-count constraint a Temporal operator actually has to plan for. Four durable wiki disclosures Part 1 did not canonicalise:

Temporal serialises all updates on a single shard ("Temporal serializes all updates belonging to the same shard, so all updates are sequential. As a result, the latency of a database operation limits the maximum theoretical throughput of a single shard."). This is the concrete correctness constraint driving Temporal's per-shard throughput ceiling — canonicalised as concepts/serialized-per-shard-updates + concepts/single-shard-throughput-ceiling.
numHistoryShards is immutable after initial cluster deployment ("the value is immutable after the initial cluster deployment. You must set this value high enough to scale with this Cluster's worst-case peak load"). This is a hard operational constraint distinct from any other sharded system the wiki covers — canonicalised as concepts/num-history-shards-immutability.
Worked production VSchema for Temporal showing the canonical two-keyspace split: small metadata tables (namespaces, cluster membership, queue metadata, buffered events) live in an unsharded keyspace; large tables (executions, history_node, history_tree, tasks, replication_tasks, etc.) live in a sharded keyspace with xxhash as Vindex on shard_id or range_hash columns. Canonicalised as patterns/split-sharded-plus-unsharded-keyspaces.
xxhash as the named Vitess Primary Vindex function — first wiki source naming the concrete hash function used in a production Vitess VSchema ("xxhash": {"type": "xxhash"}).
Production QPS ledger for a Temporal-on-PlanetScale customer: "QPS consistently fluctuates between 40k min and 200k max" across Black Friday / Cyber Monday (peaks: 100k sustained, jumps to 120–180k at end-of-day). Empirical anchor for Temporal-on-PlanetScale at retail-peak load.

Key takeaways¶

Temporal's single-shard throughput is latency-bound, not bandwidth-bound. "Temporal serializes all updates belonging to the same shard, so all updates are sequential. As a result, the latency of a database operation limits the maximum theoretical throughput of a single shard." This makes the per-shard ceiling a function of the persistence-layer operation latency, not the disk or network throughput of the backing store. Adding more cores or IOPS to one shard doesn't help; you need more shards. (Source: this post.)
numHistoryShards is a one-shot decision. "Tuning the History Shard Count (numHistoryShards) in Temporal is a critical and required configuration of a Temporal cluster. The configured value assigned in this step directly impacts the system's throughput, latency, and resource utilization. … the value is immutable after the initial cluster deployment. You must set this value high enough to scale with this Cluster's worst-case peak load." Immutable-after-deploy is distinct from Vitess's resharding-is-a-revolving-door property on the storage side — a Temporal operator cannot resolve an undersized-shards mistake by resharding the backing MySQL; the shard-id hash is baked into Temporal's code path. (Source: this post.)
Horizontally-sharded storage composes cleanly with Temporal's per-shard-serialised update discipline. Because Temporal shard-addresses every row it writes (shard_id or range_hash), a Vitess VSchema that uses the same column as the Primary Vindex ensures each row's owning shard stays on one MySQL primary — which preserves Temporal's single-shard-serialisation invariant trivially: "For most Temporal tables, you will find either shard_id or range_hash defined within the Primary Key. This maps directly to the Primary Vindex we use as our Sharding Key when we create our VSchema." (Source: this post.)
The canonical two-keyspace Temporal VSchema splits the table set by traffic / size. Unsharded keyspace tables: buffered_events, cluster_membership, cluster_metadata, cluster_metadata_info, namespace_metadata, namespaces, queue, queue_metadata, request_cancel_info_maps, schema_update_history, schema_version, signal_info_maps, signals_requested_sets, timer_info_maps. Sharded keyspace tables (Vindex on shard_id or range_hash): activity_info_maps, current_executions, executions, history_node, history_tree, replication_tasks, replication_tasks_dlq, shards, task_queues, tasks, timer_tasks, transfer_tasks, visibility_tasks. Sharding function: xxhash. (Source: this post.)
Write-intensive workload profile. "Temporal is a very write-intensive application; it's easy to accumulate several terabytes of bin logs during application updates and upserts. In addition to this, some tables grow faster and have more traffic than others." The per-table traffic skew is what drives the two-keyspace split — metadata tables (small, low-traffic) don't need sharding overhead; the write-hot tables benefit from it immediately.
PlanetScale uses Temporal themselves. "Our PlanetScale Infrastructure team has also recently implemented Temporal workflows internally to automate manual human tasks for Vitess releases. In addition, we have customers using it in production in sharded environments." First wiki disclosure that Temporal is PlanetScale's internal release-automation substrate for Vitess — self-composition of product primitives.
Migration path for existing Temporal-on-PlanetScale customers is via MoveTables + SwitchTraffic. "we started the MoveTables process for all tables using the Vindex syntax. Once the MoveTables process completes, the routing rules stay in place. Then we manually switched traffic from the unsharded source keyspace to the new sharded keyspace using the SwitchTraffic command." Canonical wiki instance of the routing-rule-swap cutover pattern applied to a unsharded → sharded migration on a production Temporal workload.

Operational numbers¶

Weekly baseline QPS (Nov 23 – Nov 29): "fluctuates between 40k min and 200k max" — single customer's Temporal database on PlanetScale.
Black Friday (Thursday): starts 100k, dips to 30k, rises to 100k second-half.
Black Friday (Friday): starts 100k, dips to 40k, rises to 100k second-half.
Cyber Monday (Monday): ~100k steady, jumps to 120k end-of-day.
Cyber Monday (Tuesday): ~100k steady, jumps to 180k end-of-day.
Sustained peak with "no interruptions"; specific shard count, MySQL instance size, Temporal numHistoryShards value not disclosed.

Architectural framing: Vitess side (pedagogical)¶

Part 2 re-introduces the Vitess runtime architecture at pedagogy-101 altitude — this material is already canonicalised in more detail on systems/vitess, systems/vtgate, systems/vttablet, concepts/keyspace, and concepts/keyspace-id. The novel content is the Temporal-specific VSchema mapping and the two-keyspace split pattern.

Caveats¶

Customer details elided: no shard count, numHistoryShards value, MySQL instance size, or pt-heartbeat lag budget disclosed. The QPS numbers are substrate-capability anchors, not SLO baselines.
xxhash is a choice, not a recommendation: Longoria notes "we approach sharding differently for each use case, and this customer example isn't how we shard all our customers today." xxhash is the standard Vitess Primary Vindex for most workloads (fast, even distribution), but PlanetScale's customer- consultation voice retains flexibility to use other Vindex types.
Single-shard throughput ceiling is Temporal-specific: the serialise-per-shard discipline is a property of Temporal's consistency model, not a Vitess / MySQL limitation. A Vitess cluster serving a non-Temporal workload can parallelise writes within a shard; the ceiling only bites on Temporal-shape workloads.
numHistoryShards vs Vitess shard count are independent knobs: Temporal shards shard_id into N buckets (where N = numHistoryShards, immutable); Vitess shards each shard_id range into M MySQL primaries (where M is mutable via Reshard). The two-axis relationship (Temporal shard count × Vitess shard count) is only partially explored in the post — it doesn't name whether a single Vitess shard holds multiple Temporal shards or vice versa, though the xxhash(shard_id) VSchema implies N Temporal shards distributed across M Vitess shards by hash-range ownership.