PlanetScale¶
PlanetScale Engineering blog is a Tier-3 source on the sysdesign-wiki. PlanetScale is a managed relational-database vendor originally built on top of Vitess (MySQL-protocol sharding substrate spun out of YouTube) and now offering both MySQL and Postgres products. The PlanetScale system page covers product scope, integrations, and positioning separately.
Tier classification¶
Tier 3 — PlanetScale's blog straddles marketing / developer-education / engineering deep-dive. A substantial fraction of posts are product-announcement / launch / pricing content that fails scope. A significant minority are genuine database-internals education written by authors like Ben Dicken (ex–Turso, ex–Pure Storage) and Sam Lambert (CEO, ex-GitHub) — these consistently clear the Tier-3 bar via detailed coverage of storage engines, index structures, query optimisation, contention patterns, and production war stories from previous employers.
Skip rules specific to PlanetScale¶
- "Introducing…" / "Now available…" / pricing announcements — skip unless there is a substantive architecture section alongside the announcement.
- Migration-guide marketing ("Moving from $competitor to PlanetScale") — skip unless backed by a concrete architectural narrative.
- Generic cloud-database industry commentary — skip.
- Dicken database-internals / Burlacu architecture / Lambert ex-GitHub war-story posts — default include. These are Tier-3-clearing by construction.
Recent articles¶
- 2025-10-01 — Larger than RAM Vector Indexes for Relational Databases (Vicent Martí, 2025-10-01) — engineering deep-dive companion to the 2024-10-22 vector-beta announcement and the 2026-03-25 GA post. Martí, who led the two-year PlanetScale vector-index project, exposes the mechanism behind three earlier claims. The article articulates why none of HNSW, stock SPANN, or DiskANN fit a relational database — "The majority of existing publications discuss data structures that must fit in RAM — a non starter for any relational database! Many of them expect the indexes to be static, and indexes in a relational database very much are not." PlanetScale ships two variants: an in-memory HNSW with transactional guarantees (99.9% recall; expensive to update incrementally) as an opt-in, and a hybrid larger-than-RAM index built on SPFresh-inside-InnoDB.
Load-bearing new wiki primitives extracted:
-
LSM-emulation on B-tree via composite index — the
core invention that avoids forcing users onto MyRocks.
Posting-list appends become
(head_vector_id, sequence)composite-key inserts; lookups are B-tree range scans. "B-trees are very good at range scans, because adjacent keys are stored next to each other in the pages." - Vector versioning for deletion — 1-byte version counter per vector + tiny in-memory versions table makes reassignment and deletion append-only flag flips. "Deletes are performed by marking a delete flag in the versions table — the same table that is used to mark a vector stale because it's been moved to another posting list. It's really that simple!"
- WAL-tied in-memory index mutation — in-memory HNSW head index mutations are journalled to a WAL tied to the same InnoDB transaction as the posting-list changes; recovery replays WAL on top of the last serialised snapshot; compaction pauses background jobs (not user traffic) because user operations don't mutate the head index.
Algorithmic extensions PlanetScale made to the SPFresh paper:
- HNSW-on-centroids instead of BK-Tree / K-D Tree. "An optimized implementation of HNSW beats these other data structures in every possible metric (performance, recall, construction time)." Resolves the ambiguity left by the 2026-03-25 GA post between "tree structure" vs SPANN's tree+graph.
- Random sampling over BK-Tree clustering for centroid selection. "The law of large numbers ensures that our random sampling is representative." Simpler to implement and tunes with head-index size.
- K-way split instead of LIRE's 2-way split when a posting list grows very large under insert load — "being able to immediately split it into multiple smaller postings is an important optimization."
- Defragment op added on top of the SPFresh paper's Split / Reassign / Merge — compacts underlying B-tree rows without merging postings or removing stale vectors, for heavy-insertion regimes.
- Updates lifted to the SQL-row level rather than index-internal — a delete + insert at InnoDB level transparently maps through, exploiting the cheap delete path.
Architectural thesis verbatim: "approximate" applies
only to recall, not to transactional invariants.
Committing a thousand vectors means the next SELECT
considers all of them. Aborting means none appear.
Invariants hold across crashes and failovers.
Operational numbers: head index = 20% of dataset (tunable); total memory ceiling = 30% of dataset; HNSW (in-memory variant) recall = 99.9%; version counter = 1 byte. OpenAI embedding-dimension context: 1,536 / 3,072.
Caveats: no production performance numbers (those land in beta/GA posts); 1-byte version-counter wraparound protocol not documented; head-index WAL- compaction pause duration not quoted; K-way-split clustering cost under high insert load acknowledged but not quantified; reassignment heuristic "maybe (but not necessarily!) the vector is in a wrong posting list" is probabilistic.
Cross-source continuity: engineering-altitude companion to the beta (2024-10-22) and GA (2026-03-25) marketing-altitude announcements; directly cites for B-tree background; references the SPANN + HNSW + SPFresh papers by direct link. Martí closes with PlanetScale's relational-database-is-the-future positioning: "relational databases are the past, the present and the future of data storage, and they're long overdue for a well tailored solution for vector indexing and approximate nearest neighbor search."
-
2022-04-28 — Consensus algorithms at scale: Part 5 — Handling races (Sugu Sougoumarane, originally 2022-04-28, re-fetched 2026-04-21) — twenty-sixth PlanetScale first-party ingest and sixth canonical consensus-series instalment on the wiki (the full Parts 3–8 set is now complete; Parts 1–2 remain the only gap at the series opening). Part 5 is the taxonomic hinge of the series: layered on Part 4's revoke+establish split, it introduces the third orthogonal concern — race handling between multiple electors — and canonicalises the two race-resolution families: lock-based ("first elector wins" via exclusive lock) and lock-free ("newest elector wins" via proposal numbers). Load-bearing taxonomic move: Raft is lock-based in substance even though it doesn't name a lock — "the act of obtaining a lock is shadowed by other actions it takes. If you subtract out the other actions (revoke, establish, and propagate) in the code that performs an election, it will be evident that what is left is the act of obtaining a distributed lock." Making the lock explicit is the architectural unlock that allows independent tuning of the lock vs the revoke/establish steps. Elector/candidate split introduced: "It is not necessary for a candidate to elect itself as leader. A separate agent can perform all the necessary steps to promote a candidate as leader." YouTube's 15-primaries-per-shard, one-elector-per-region operational shape is the motivating example; Vitess's VTOrc is named as the elector component. Vitess+etcd canonicalised as external-coordinator shape: "In Vitess, we use an external system like etcd to obtain such a lock. The decision to rely on another consensus system to implement our own may seem odd. But the difference is that Vitess itself can complete a massive number of requests per second. Its usage of etcd is only for changing leadership, which is in the order of once per day or week." Cost asymmetry (5–6 orders of magnitude between per-sec requests and per-day leadership changes) is the load-bearing justification for running a separate consensus system for the lock. Proposal number unification: Paxos's proposal numbers and Raft's term numbers are unified under a single proposal number concept; the lock-free family's race-resolution primitive; required property is total order + monotonic-in-time; safety rule is "every elector must assume there may be another elector with an older timestamp attempting a leadership change. It must therefore attempt to revoke leadership from all potential candidates, not just the current known leader." Forward-progress vs time-component trade-off canonicalised: lock-free gets concepts/forward-progress for free (newer-proposal-number supersedes stalled elector); lock-based manufactures it via a time-component (auto-release on timeout); the same time-component doubles as the leader lease, which enables direct-to-leader consistent reads without a quorum RTT. Clock-skew safety margin: "typically in the milliseconds" drift → "many seconds" sequencing granularity → safety margin of ~10³. Verdict (Part 5 short form): "the elegance of a lock-free approach may seem tempting, but the lack of a stable leader complicates everything else. Having to reach a quorum for consistent reads is a major drawback for scaling systems. Weighing these options, a lock-based system should be preferred for large scale consensus systems." The full four-advantage enumeration (graceful demotion, node membership, consistent reads, anti-flapping) is deferred to Part 8 — Part 5 is the original source; Part 8 is the capstone restatement. Canonical-source extensions: concepts/elector (canonical introduction; YouTube + Vitess/VTOrc motivating examples), concepts/proposal-number (canonical introduction; Paxos/Raft unification), concepts/forward-progress (canonical introduction; lock-free free vs lock-based manufactured), concepts/leader-lease (canonical introduction; Vitess topology-publish pattern), concepts/revoke-and-establish-split (Part 4 is the canonical intro; Part 5 layers race-resolution as third concern). Pages created (4): sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-5-handling-races; 3 new patterns (patterns/lock-based-leader-election, patterns/lock-free-leader-election, patterns/external-coordinator-for-leadership-lock). 0 new concept pages — all 5 referenced concepts already canonical on the wiki from the earlier Part 6/7/8 ingests that forward-referenced Part 5. 0 new system pages — all referenced systems already canonical (systems/vitess, systems/vtorc, systems/etcd, systems/mysql, systems/planetscale). Scope disposition: on-scope Tier-3 — Sugu-Vitess-co-creator / PlanetScale-CTO byline; foundational-taxonomic instalment that anchors five downstream Seen-in entries already on the wiki. Return contract:
ingested: wiki/sources/2026-04-21-planetscale-consensus-algorithms-at- scale-part-5-handling-races.md — 9 wiki pages touched. -
2020-09-26 — Consensus algorithms at scale: Part 3 — Use cases (Sugu Sougoumarane, originally 2020-09-26, re-fetched 2026-04-21) — Foundational instalment of the eight-part Consensus algorithms at scale series. Takes the framework established in Parts 1-2 (durability- agnostic rules, single-leader narrowing) and exercises it on four concrete production deployment shapes — fifteen-primary-three-DC, four-zone-single-node, six-node-three-zone, and two-region-two-zones-per-region — all "uncomfortable for a majority based consensus system." Canonicalises three new concepts (concepts/durability-as-use-case-dependent, concepts/intersecting-quorums, concepts/failure-tolerance-envelope, concepts/availability-vs-data-loss-tradeoff) and two new patterns (patterns/optimize-for-common-case-frequency-asymmetry, patterns/single-ack-completion-with-wider-election). Load-bearing claim: "Durability is use-case dependent, we made it an abstract requirement requiring the consensus algorithms to assume nothing about the durability requirements." Worked example: "if durability is achieved with 2/5 nodes, then the election algorithm needs to reach 4/5 nodes to intersect with the durability criteria. In the case of a majority quorum, both of these are 3/5. But our generalization will work for any arbitrary property." — the FlexPaxos generalisation stated in plain English, 2 years before Part 8 cites Howard/Malkhi/Spiegelman 2017 by name. YouTube production datum: "at YouTube, although the quorum size was big, a single ack from a replica was sufficient for a request to be deemed completed. On the other hand, the leader election process had to chase down all possible nodes that could have acknowledged the last transaction. We did consciously trade off on the number of ackers to avoid going on a total wild goose chase." — canonical wiki instance of patterns/single-ack-completion-with-wider-election with the rare first-person disclosure that election-scan width was bounded by design. Request-vs-election frequency asymmetry framing as the load-bearing optimisation axis: "hundreds of requests per second" vs "more than one election per day would be surprising" → orders-of-magnitude gap → bias toward minimum durability on the request path. Canonicalises k = 2 as the observed production upper bound on durability rules. "This is exactly what many users have done with Vitess" — the first explicit Vitess naming in the series, 5 instalments before Part 8's explicit pluggable-durability recommendation. Sixth canonical Sugu-series ingest (after Parts 4, 5, 6, 7, 8); fills the gap between the series' opening (Parts 1-2, not yet ingested) and the revocation/establishment material (Part 4). Tier-3-clearing by architectural density and pedagogical load-bearing-ness.
-
2024-07-23 — The state of online schema migrations in MySQL (Shlomi Noach, originally 2024-07-23, re-fetched 2026-04-21) — Canonical wiki taxonomic survey of the three MySQL online schema-change mechanism classes as of 2024: (1) native
ALGORITHM=INPLACE— technically non-blocking on the primary but creates replica lag equal to operation duration (a 3-hour primary rebuild = 3-hour replica lag; "the replication issue is a deal breaker for most"); theSQL_LOG_BIN=0-then-per-replica-replay workaround introduces consistency risk and O(n) operator burden; canonical rejection framing "INPLACEis not a good option for non-blocking changes". (2) NativeALGORITHM=INSTANT— metadata-only, runs instantly on replicas, zero resource cost, but narrow eligibility envelope ("there's a long way to go beforeINSTANTDDL can satisfy the common needs of schema changes"); not revertible at the mechanism altitude;DROP COLUMNdestructive with metadata loss — "Not only data was lost, but also metadata. What was the column type? Length? Was it nullable?"; invisible columns are partial mitigation but "does not help" with explicit-column callers. (3) Third-party shadow-table tools (pt-osc, gh-ost, "recent newcomer" spirit, Vitess) — six-property operational profile canonicalised here: mimic-alter + slower + extra-disk + binlog- bloat + throttle-respecting + batched-interruptible. Shadow-table is the 2024 default "for the (still vast) majority of changes";INSTANTis a short-circuit inside that default, not a replacement for it. Load-bearing operational-complexity framing: "if you already have to use one of the 3rd party solutions, you may as well use it all the time" — mixing two execution models doubles operational surface. [[patterns/auto-detect-instant-ddl- eligibility|Canonical new pattern]]: Vitess and spirit auto-detectINSTANTeligibility at submission time, short-circuiting to the fast path when eligible without forcing operator choice — "you don't need to think about it or be aware of which particular version supports which changes." Vitess uniquely preserves first-class revertibility via [[patterns/instant-schema-revert-via-inverse- replication|inverse-replication kept alive post- cutover]] — "not only revert back to the original schema, but also to preserve the would-be lost data." 5 canonical new wiki pages: source + 2 concepts (concepts/mysql-algorithm-inplace, concepts/schema-change-revertibility-asymmetry) -
2 systems (systems/pt-online-schema-change, systems/spirit) + 1 pattern (patterns/auto-detect-instant-ddl-eligibility). Extends 8 pages: concepts/online-ddl (new top-of-Seen-in canonicalising three-mechanism-class taxonomy + "if you already have to use 3rd-party, use all the time" synthesis), [[concepts/ instant-ddl-mysql]] (new top-of-Seen-in canonicalising 2024 taxonomy altitude + destructive- risk amplifiers + no-forecastable-eligibility), concepts/non-revertible-schema-change (new top-of-Seen-in canonicalising two-loss data-plus-metadata structure on
DROP COLUMN), concepts/mysql-invisible-column (new top-of-Seen-in canonicalising explicit-column-caller limitation), systems/gh-ost (new top-of-Seen-in canonicalising 2024 taxonomic-peer framing + no- auto-detect gap vs Vitess/spirit), [[patterns/ shadow-table-online-schema-change]] (new top-of- Seen-in canonicalising six-property operational profile + mixing-doubles-surface framing), plus frontmatter source appended on systems/mysql, systems/vitess, systems/planetscale. Shlomi Noach's ninth wiki ingest. Tier-3 on-scope — Noach default-include voice per companies/planetscale skip rules; architecture density ~85% on ~2,500-word post; taxonomic-survey voice (not a new mechanism disclosure) but resolves definitional gaps —INPLACEwas implicit across ~dozen prior ingests without a dedicated concept page;pt-online-schema-changewas named across ~dozen prior pages without a system anchor;spiritfirst wiki mention; revertibility asymmetry framed but never canonicalised. Canonical cross-source continuity: companion to 2022-05-09 [[sources/2026-04-21-planetscale-the-operational- relational-schema-paradigm|paradigm manifesto]] (axiom-layer charter); companion to 2024-09-04 [[sources/2026-04-21-planetscale-instant-deploy- requests|Instant Deploy Requests]] (product-layer opt-in on top of this taxonomy); companion to 2022-10 [[sources/2026-04-21-planetscale-behind- the-scenes-how-schema-reverts-work|schema reverts internals]] (mechanism-depth canonicalisation of Vitess's first-class revertibility). Caveats: taxonomic not incident-retrospective; no quantitative decision matrix; partitioning treatment thin; spirit mechanism hand-waved; declarative-schema-migration axis orthogonal and deferred to [[patterns/declarative-schema- management]];INSTANTenvelope is a moving target (post predates MySQL 8.4 extensions). -
2024-04-24 — The MySQL adaptive hash index (Ben Dicken, originally 2024-04-24, re-fetched 2026-04-21) — Canonical wiki disclosure of MySQL InnoDB's Adaptive Hash Index (AHI) as an in-memory hash table built at runtime as an optional overlay on top of B+tree indexes. For index values InnoDB observes being looked up repeatedly, it materialises a hash entry whose key is the (full or prefix of) index value and whose value is a direct pointer into the target page inside the buffer pool. Future lookups of the same value short-circuit the B+tree descent: a single O(1) hash probe replaces the O(log N) tree walk. Three canonical InnoDB-specific disclosures: (1) InnoDB does not support on-disk
HASHindexes — "If you try to create an index withUSING HASHon an InnoDB-powered table, MySQL will instead create a B-tree index" — the AHI is the in-memory-only workaround that recovers hash-lookup performance for the hot subset of keys. (2) AHI pointers go only into buffer-pool- resident pages — "the buffer pool needs to be sufficiently large for the AHI to kick in. If it is small and there are a lot of evictions taking place, it is not worth using it" — so AHI and buffer pool form a two-tier dependency where the buffer pool is the substrate and the AHI is the overlay. (3) AHI is adaptive both up and down — "MySQL is able to automatically adjust its use of the AHI based on the behavior it observes in the buffer pool. If conditions are not right for its use (few repeated lookups, small buffer pool, etc), MySQL will reduce or eliminate its use" — canonicalising the new runtime- adaptive in-memory index pattern. Benchmarked speed- ups on a 390M-rowusertable with 4-levelusernameB+tree: +16% QPS on single-value repeated lookups (14,044 → 16,701 QPS;SHOW ENGINE INNODB STATUSshows 350,953 hash searches/s + 50,985 non-hash searches/s with AHI on vs 0 hash / 418,334 non-hash with AHI off), +20% QPS on 1000-value hot-set random lookups (9,232 → 11,562 QPS). Canonical scaling claim: "workloads using deeper B-tree indexes may see even more performance improvement" — overlay benefit scales with underlying tree depth. Config surface:innodb_adaptive_hash_index(default on, flip to 0 to disable). Observation surface:SHOW ENGINE INNODB STATUS \G;→INSERT BUFFER AND ADAPTIVE HASH INDEXsection exposesHash table sizeand the load-bearing metrichash searches/svsnon-hash searches/s. Wiki pages created (3): source - 1 concept (concepts/adaptive-hash-index) + 1 pattern (patterns/runtime-adaptive-in-memory-index). Wiki pages extended (4): concepts/innodb-buffer-pool (AHI as first-class Related-caches bullet + top-of- Seen-in canonicalising buffer-pool-as-AHI-substrate + frontmatter source + related), concepts/b-plus-tree (top-of-Seen-in canonicalising hash-overlay-on-B+tree framing + frontmatter source + related), systems/innodb (AHI elevated from sidebar mention to first-class Architectural-shape bullet with benchmarked numbers + top-of-Seen-in + AHI bullet in Transactional / durability mechanisms + frontmatter source + related), systems/mysql (frontmatter source + related). Plus companies/planetscale (this entry + frontmatter) + index.md + sources/index.md + companies/index.md + log.md
-
raw frontmatter flip. Ben Dicken's canonical database- internals pedagogy voice, his thirteenth+ wiki ingest (after B-trees, slotted counter, vectors, I/O devices, EBS failure rates, Go interpreters, PlanetScale-for-Postgres, caching, Postgres 17-vs-18, processes-and-threads, profiling-memory-in-MySQL, memory- profiling, PgBouncer-scaling, sharding-workflows, identifying-profiling-queries, AI-index-suggestions, increase-IOPS-with-sharding, faster-backups-with-sharding, graceful-degradation-in-postgres). Architecture density ~95% — two worked benchmarks + config surface + observation surface + workload-fit reasoning. Tier-3 on-scope unconditionally — Dicken database-internals pedagogy is default-include per the skip rules above. Post is ~1,300 words and publishes April 24, 2024 — 6 months before his B-trees and database indexes pedagogy post (2024-09-09) but assumes buffer-pool + B+tree lookup mechanics as background; the B-trees post is the foundational reference readers need first. Cross-source continuity: sibling pedagogy-voice post to (the foundational reference); complements Dicken 2025-07-08 Caching at a finer altitude — where the caching post frames the InnoDB buffer pool as MySQL's analog of Postgres's shared_buffers + page cache stack, this post canonicalises the AHI as a third layer on top of that buffer pool, adaptive to workload. Also sibling in "runtime-adaptive overlay" shape to which canonicalises the same pattern at VM-interpreter altitude (type-specialised bytecode + quickening + deoptimisation) — see patterns/runtime-adaptive-in-memory-index's Related-patterns section for the two-altitude sibling framing. No existing-claim contradictions — strictly additive. Caveats: pedagogy voice (no production incident retrospective); no coverage of AHI
btr_search_latchcontention (the canonical historical failure mode before MySQL 5.7's partitioning fix); no mention ofinnodb_adaptive_hash_index_parts;Hash table size 276707surfaced without sizing-mechanism explanation; benchmark harness (python3 same_query.py/load.py) not open-source-linked; single-benchmark regime (500k iterations of same or 1000-rotated query) acknowledged as unrealistic; MEMORY-engineHASH-index support elided in the "MySQL doesn't supportHASHindexes" framing; prefix-keyed AHI mechanics named but not expanded (no prefix-length selection disclosure). URL rule compliance: raw file'surl:field verbatimhttps://planetscale.com/blog/the-mysql-adaptive-hash-index(filename slug matches — no truncation risk); source pageurl:field + body## Sourcesection both use verbatim URL. Raw-markdown link uses actual raw filename2026-04-21-the-mysql-adaptive-hash-index-af850dba.md(hash-suffixed). H1 + wikilink syntax compliance observed. Return contract:ingested: wiki/sources/2026- 04-21-planetscale-the-mysql-adaptive-hash-index.md — 12 wiki pages touched. -
2026-03-13 — Scaling Postgres connections with PgBouncer (Ben Dicken, 2026-03-13, re-fetched 2026-04-21) — Canonical wiki disclosure of the PgBouncer configuration surface. Dicken's twelfth wiki ingest; operator- facing field manual for tuning PgBouncer on PlanetScale for Postgres. Walks the full connection chain
max_client_conn→default_pool_size→max_db_connections/max_user_connections→ Postgres-sidemax_connections+superuser_reserved_connectionsand formalises sizing asnum_pools × default_pool_size ≤ max_db_connections ≤ max_connections − reserved_slack. Three canonical PlanetScale deployment topologies (concepts/pgbouncer-deployment-topology): local (same server as primary, port6432, same credentials, shares failure domain); dedicated primary on separate nodes ("client connections persist through resizes, upgrades, and most failovers";|pgbouncer-nameusername-suffix invocation; stitches through local bouncer); dedicated replica routing to replicas, bypassing local bouncer. Three pooling modes with unequivocal verdict "Transaction pooling is the only sensible option" — session pooling is 1:1 and "does little to reduce Postgres connection count"; statement pooling breaks multi-statement transactions ("not useful in 99% of cases"); transaction pooling is the default with known-unsupported features (LISTEN, session-levelSET/RESET, SQLPREPARE/DEALLOCATE) falling back to direct connections. PlanetScale only supports transaction pooling. Canonicalquery_wait_timeoutdefault: 120 seconds (queue-then-disconnect admission control). Canonical 5+ MB RAM per Postgres connection datum — specific number behind the general "memory overhead" framing in Dicken's sister 2025-09-24 Processes and Threads post. Three worked sizing scenarios: PS-80 (1 vCPU/8 GB, 1 DB × 3 users:app/analytics/export):max_connections=50,max_client_conn=500,default_pool_size=30,max_user_connections=30,max_db_connections=40, 10-slot admin reserve; M-2650 (32 vCPU/256 GB, same multi-tenant shape at 32× scale):max_connections=500(10× not 32× — "we don't want to increase direct Postgres connections by 32x"),max_client_conn=10000,default_pool_size=200,max_user_connections=200,max_db_connections=450, 50-slot reserve; M-1280 (16 vCPU/128 GB, single-tenant 200 DBs × 200 roles 1:1 mapping):max_connections=400,max_client_conn=5000,default_pool_size=2(load-bearing whennum_pools=200),max_user_connections=2,max_db_connections=2, 200 pools × 2 = 400 upstream connections exactly matchingmax_connections. The single- tenant inversion: "default_pool_sizebecomes the load- bearing dial" rather thanmax_user_connectionswhich dominates multi-tenant shapes. Two canonical multi- PgBouncer patterns: layered (app-side funnel + DB-side funnel, "especially useful when you need connection pooling both close to compute and close to the database") and isolated per workload (dedicated PgBouncer per web / background workers / analytics class, so a spike in one class doesn't saturate the shared pool; "creates independent funnels with their own limits, pool sizing, and failure domains"). Canonical compose-with-Traffic-Control framing: "PgBouncer manages connections, Traffic Control manages resource consumption. The two approaches complement each other well." Neki referenced as future-generation Postgres-scaling substrate: "Though there are upcoming systems like Neki which will solve this problem in a more robust way, PgBouncer has proven itself an excellent connection pooler for Postgres." 9 new wiki pages: source + 5 concepts (concepts/pgbouncer-connection-chain, concepts/pgbouncer-pool-sizing-formula, concepts/query-wait-timeout, concepts/reserved-admin-connection-budget, concepts/pgbouncer-deployment-topology) + 3 patterns (patterns/layered-pgbouncer-deployment, patterns/isolated-pgbouncer-per-workload, patterns/three-pool-size-budget-allocation). Extends systems/pgbouncer (first canonical configuration-surface disclosure + three topologies + three scenarios + multi- PgBouncer framing + Neki forward-reference), systems/postgresql, systems/planetscale-for-postgres, systems/planetscale-traffic-control, systems/neki, concepts/max-connections-ceiling, concepts/memory-overcommit-risk, concepts/connection-pool-exhaustion, concepts/reserved-connection, patterns/two-tier-connection-pooling. Tier-3 on-scope — Ben Dicken canonical database-internals pedagogy voice; architecture density ~90%. Caveats: operator-facing tutorial (not Vitess/PlanetScale internals); transaction-pooling unsupported-features list incomplete;query_wait_timeouttuning elided; sizing guidance illustrative not prescriptive; layered/multi-PgBouncer topology-altitude only (no per-tier configs); Neki forward-looking (waitlist-only); no benchmarked numbers (unlike sibling 1M-connections post); PgBouncer HA mechanism not detailed. -
2024-04-11 — Profiling memory usage in MySQL (Ben Dicken, originally 2024-04-11, re-fetched 2026-04-21) — Direct companion to the 2024-03-29 Identifying and profiling problematic MySQL queries post by the same author, on the orthogonal memory (space) axis. Canonicalises four new wiki primitives: (1) MySQL memory instrumentation — the
memory/*subset ofperformance_schema.setup_instruments(several hundred categories out of ~1,255 total instruments) and the fivememory_summary_*tables; (2) memory profiling granularity — the five-way grain split (account / host / thread / user / global) and the canonical per-query grain gap: "Notice that there is no specific tracking for memory usage at a per-query level. However, this does not mean we cannot profile the memory usage of a query!" (3) per-thread memory profiling — the substitute for per-query profiling, exploiting MySQL's 1:1 connection↔thread binding for the query's lifetime; (4)CONNECTION_ID()vsthread_id— the canonical SQL mapping viaSELECT thread_id FROM performance_schema.threads WHERE PROCESSLIST_ID=@cid. Three new patterns: patterns/periodic-sampling-memory-profiler (fixed-interval polling of instantaneous-state counters with a sliding window), patterns/two-connection-profiling-setup (one connection runs the workload, a second observes — required because MySQL's client protocol is synchronous), patterns/live-visualization-of-sampled-metrics (matplotlibstackplot+plt.ion()+canvas.start_event_loopfor in-process live dashboards). Worked example: 100M-rowSELECT alias FROM chat.message ORDER BY alias DESC LIMIT 100000dominated bymemory/sql/Filesort_buffer::sort_keys(203,488 bytes) +memory/innodb/memory(169,800) +memory/sql/THD::main_mem_root(46,176) +memory/innodb/ha_innodb(35,936). TheFilesort_buffer::sort_keysinstrument is undocumented (documentation IS NULLinsetup_instruments) — canonical datum that MySQL's memory catalogue is partially self-describing- by-name-only. A second visualisation shows aFULLTEXTindex build growing into "hundreds of megabytes". Two Python scripts: minimal 250-ms sampler printing top-4 categories to stdout; fullmatplotlibmonitor with configurable--frequency(default 500 ms), 50-sample sliding window, 12-colour palette, legend-cardinality reduction via underscore-prefix label trick (categories < total/1024/50 suppressed from legend). Wiki pages created (7): source + 3 concepts + 3 patterns. Extends concepts/mysql-performance-schema (new Seen-in entry framing memory axis as companion to time/rows axis), systems/mysql (new Seen-in), frontmatter sources on both. Tier-3 on-scope — Ben Dicken pedagogy post, default-include per companies/ planetscale skip rules; architecture density ~95% of body. Caveats: pedagogy voice (no production fleet-wide numbers); sampling aliasing (sub-interval spikes missed); short queries under-sampled (Dicken flags explicitly); observer overhead real but unquantified; script has a typo (connection.close()references undefined variable); no per-query peak aggregation (only instantaneous samples); MySQL-only. -
2025-09-24 — Processes and Threads (Ben Dicken, 2025-09-24, re-fetched 2026-04-21) — Interactive pedagogical article on OS process + thread fundamentals that lands, in the final third, on the architectural pay-off: Postgres is process-per- connection, MySQL is [[patterns/thread-per-connection- database|thread-per-connection]], and both need connection pooling at scale. Canonical operational numbers: ~5 μs per process context switch (tens of thousands of instructions), ~1 μs per thread switch (~5× faster — shared address space, no TLB flush), hundreds of switches per second → tens of millions of instructions/sec of bookkeeping. Verbatim canonical framing of the Postgres architectural choice: "Postgres is implemented with a process-per-connection architecture. Each time a client makes a connection, a new Postgres process is created on the server's operating system. There is a single 'main' process (PostMaster) that manages Postgres operations, and all new connections create a new Process that coordinates with PostMaster." And the MySQL contrast: "MySQL is a great contrast, designed to run as a single process (
mysqld). However, it is also capable of handling thousands of queries per-second, hundreds of connections, and utilizing multi-core CPUs. It achieves this via threads." Structural criticism of process-per-connection canonicalised: "Processes are heavy: there is memory overhead and a time overhead for managing them." Universal connection-pooling mitigation canonicalised at OS-fundamentals altitude, with the 5–50 direct- connections-to-DB sizing datum: "A pooler … maintains its own pool of direct connections to the database, typically between 5 and 50. This is a small enough number that the database server is not negatively impacted by too many connections. The pooler … acts as a funnel: pushing the queries from thousands of connections into tens of connections." 4 canonical new concept pages: concepts/process-os, concepts/thread-os, concepts/context-switch, concepts/fork-execve — the OS primitives this wiki previously referenced without a definitional home. 2 canonical new pattern pages: patterns/process-per-connection-database, patterns/thread-per-connection-database — the architectural-choice duality canonicalised as peer patterns. Extends: systems/postgresql (top-of-Seen-in entry canonicalising process-per-connection + ~5–10 MB/backend steady-state memory + pooler-as-mitigation framing), systems/mysql (top-of-Seen-in entry canonicalising thread-per-connection architectural choice + OS-substrate why beneath the lower per-connection memory cost), systems/pgbouncer (frontmatter + related extended — connection-pooling-as-OS-substrate-mitigation framing), patterns/two-tier-connection-pooling (frontmatter + related extended — Dicken's canonicalisation at the OS-fundamentals altitude is the why beneath van Dijk's production-scale benchmark canonicalisation of the two-tier pattern), concepts/connection-pool-exhaustion (frontmatter + related extended — the natural-threshold throttling signal is a symptom of the process/thread-per- connection memory cost Dicken canonicalises here). Tier-3 on-scope — Ben Dicken database-internals-voice post (his eleventh wiki ingest after B-trees / slotted- counter / IO-devices / caching / benchmarking-Postgres-17- vs-18 / increase-IOPS-sharding / graceful-degradation / identifying-slow-queries / faster-backups / dealing-with- large-tables / database-sharding / introducing-sharding-on- PlanetScale-with-workflows — default-include per companies/ planetscale.md skip rules). Architecture density ~40% concentrated in the final third of the post (the first two- thirds are pure OS pedagogy), but the canonical contribution is the OS-substrate why beneath the already-canonicalised database connection-scaling + pooling wiki corpus: Dicken canonicalises the OS-level memory-economics of process-per- connection vs thread-per-connection that van Dijk's 1M- connections benchmark, Gangal's VTTablet deep-dive, Reyes's RDS-vs-PlanetScale comparison, and Noach's throttler-anatomy post all reference implicitly. Borderline-case test ("Only skip if architecture content is <20% of the body") passes decisively on substance — 4 canonical new concept pages + 2 canonical new pattern pages resolve pre-existing definitional gaps on the wiki: concepts/threads-running-mysql was on the wiki but concepts/thread-os was not; [[concepts/fork- execve]] was referenced in a dozen Unix-substrate ingests (Fly machines, Firecracker, Figma's fork-model posts) without a canonical home; concepts/context-switch was implicit across ~40 ingests but had no dedicated page. Post belongs in the same interactive-article pedagogy genre as Dicken's 2024-09-09 B-trees and database indexes and 2025-03-13 IO devices and latency (both canonicalised on the wiki) — the database + OS fundamentals as substrate for architectural discussion shape. Cross-source continuity: canonical OS-substrate why beneath the Reyes / Gangal / van Dijk / Noach / Dicken connection-scaling quadrilateral — (1) 2021-09-30 Reyes names the 16k RDS-MySQL ceiling vs "nearly limitless" PlanetScale; (2) 2022-11-01 van Dijk benchmarks 1M connections via two-tier pooling; (3) 2023-03-27 Gangal canonicalises VTTablet's three-era pool mechanism; (4) 2024-08-29 Noach canonicalises connection- pool-exhaustion as natural-threshold throttling signal; (5) this 2025-09-24 post canonicalises the OS-substrate memory-economics (process-per-connection vs thread-per- connection + shared-page-table vs forked-page-table) that makes all four preceding framings load-bearing. The five posts span 4 years, 5 authors, 4 altitudes of the same architectural problem. No existing-claim contradictions — strictly additive. Caveats recorded on source page: interactive-article animation-rich; single-core assumption throughout; virtual memory deferred; Postgres per-backend memory not quantified; pooler ecosystem (pgbouncer / ProxySQL / VTTablet) not named; MySQL's thread-pool-plugin refinement of strict "thread per connection" elided; pedagogical voice, no production retrospective. -
2025-09-12 — Postgres High Availability with CDC (Sam Lambert, 2025-09-12, re-fetched 2026-04-21) — Substrate-level design critique arguing Postgres's logical-replication design makes HA and CDC operationally coupled in a way that MySQL's design does not. The load- bearing object is the logical replication slot: a durable, primary-local catalog object that pins WAL via
restart_lsn+confirmed_flush_lsnand whose post-failover eligibility on any standby depends on whether the subscriber has been observed advancing it while that standby was following. Postgres 17 failover slots serialise slot state into WAL but preserve an eligibility gate by design to preserve exactly-once CDC semantics — a lagging or offline subscriber leaves no standby eligible. Three canonical failure scenarios: CDC quiet-period forced failover breaks the stream; freshpg_basebackupreplicas remain ineligible until the next subscriber poll (6 hours in the worked example); dormant physical standby with a physical slot pinsrestart_lsnand can fill the primary's WAL volume. Canonical closing framing: "slot progress is a single-node concern that must be coordinated across the cluster at failover time, and eligibility depends on subscriber behavior outside your control." MySQL's counter-design canonicalised verbatim: "MySQL's binary log is an action log. Every transaction carries a GTID. Replicas withlog_replica_updates=ONre-emit transactions they apply into their own binlogs, preserving GTID continuity. A CDC connector records the last committed GTID set. On reconnect it tells any suitable server, 'resume from this GTID.' If the binlog containing that GTID still exists, streaming continues with no slot object and no eligibility gate." Failover: "Promote a replica. Point the connector at any replica and it resumes from it's GTID position." CDC availability bounded by binlog retention, not by subscriber polling cadence. 4 canonical new concept pages: concepts/postgres-wal-level-logical, concepts/postgres-logical-replication-slot, concepts/postgres-failover-slot, concepts/ha-cdc-coupling. 1 canonical new pattern: patterns/action-log-vs-state-log-replication — the generalised design-space view (action-log + consumer-local progress vs state-log + primary-local catalog + eligibility gate). Extends systems/postgresql - systems/mysql + concepts/change-data-capture + concepts/binlog-replication + concepts/gtid-position
-
concepts/logical-replication. Canonical substrate- critique altitude — first wiki ingest framing CDC as a design-critique lens rather than a capability-feature lens. Caveats: PlanetScale-CEO voice favourable to MySQL's model + implicitly positioning PlanetScale for Postgres as the answer (without disclosing its mitigation mechanism); Postgres 17 failover slots are young in production (2024-09-26 GA, ~1 year at publication); MySQL-side operational burdens (binlog retention horizon,
log_replica_updates=ONdisk-cost amplification) elided as counterweight; no production numbers on how often the HA-CDC coupling bites operators in practice. -
2023-10-05 — — co-canonical with same-day Aurora sibling; canonical architectural description of 5-hop Vitess data path (app → edge LB → VTGate → VTTablet → MySQL) + VTGate/VTTablet/topo-server role split. Trigger for promoting systems/vtgate, systems/vttablet, concepts/vitess-topo-server + patterns/query-routing-proxy-with-health-aware-pool to dedicated pages.
-
2024-09-04 — Instant deploy requests (Shlomi Noach, originally 2024-09-04, re-fetched 2026-04-21) — Canonical wiki disclosure of the fast-path-vs-safe-path deploy-request architecture on PlanetScale MySQL. Shlomi Noach announces instant deployments as an opt-in fast-path for deploy requests whose every statement qualifies for MySQL native
ALGORITHM=INSTANT. The canonical three-part composition rule (canonicalised as concepts/instant-deploy-eligibility): a deploy request qualifies iff every statement is (a) anALTER TABLEthat individually qualifies for MySQLINSTANT, (b) any number ofCREATE/DROP TABLEstatements, or (c) any number ofCREATE/MODIFY/DROP VIEWstatements. One non-qualifying statement disqualifies the whole deploy request — all-or-nothing at the request level. Performance delta qualitative but dramatic: "an Online DDL operation may take hours to deploy a schema change to a large tables, where an instant deployment may take just a few seconds." Two operator-visible caveats canonicalised: (1) instant deployments are non-revertible because they skip the shadow-table build that produces the 30-minute inverse-replication revert window as an emergent property — canonicalised as concepts/non-revertible-schema-change with the canonical wiki framing "revertibility is a property of the execution mechanism, not of the schema change." (2) the change can induce "a multi-second (or more) lock on the migrated table" under write-heavy workloads, because the MySQL metadata lock is below the vtgate routing layer and not hidden by query buffering. Canonical opt-in shape canonicalised as patterns/instant-deploy-opt-in: automatic pre-evaluation of eligibility (via Vitessschemadiff— ties to the 2026-04-21 Vitess 21 release notes' "moreINSTANTDDL scenario analysis" framing), conditional opt-in UI only when eligible, safe Online DDL default. "PlanetScale continues to run Online DDL as the default strategy, and users are asked to make an explicit choice when opting for instant deployments." Wiki pages created (4): source + 3 concepts (concepts/instant-ddl-mysql, concepts/instant-deploy-eligibility, concepts/non-revertible-schema-change) + 1 pattern (patterns/instant-deploy-opt-in). Wiki pages extended (7): concepts/online-ddl (new Seen-in entry framing instant deployments as a fast-path parallel to Online DDL, not a replacement); concepts/deploy-request (new Seen-in entry on the pre-evaluation + opt-in addition to the lifecycle); patterns/instant-schema-revert-via-inverse-replication (new Seen-in entry canonicalising non-applicability to instant deployments); patterns/shadow-table-online-schema-change (new Seen-in entry on the instant-DDL fast-path as complement to the pattern); systems/mysql, systems/vitess, systems/planetscale (frontmatter sources appended). Tier-3 on-scope — Shlomi Noach named-voice post on a product feature backed by architectural substance. Short post (~1,900 chars body) but clears the "Only skip if architecture content is <20% of the body" test decisively: ~85% of body advances a specific architectural primitive (eligibility composition rule, non-revertibility structural coupling, lock-caveat workload-dependence, default-safe opt-in posture). Thirty-second PlanetScale first-party ingest. Caveats: short post (mechanism not deep-dived); lock-duration unquantified; no adoption metrics; MySQL-only scope; re-surface date (2024-09-04 original — eligibility envelope has likely widened since via Vitess 21's expanded schemadiff analysis). -
2024-08-19 — Increase IOPS and throughput with sharding (Ben Dicken, originally 2024-08-19, re-fetched 2026-04-21) — Canonical wiki introduction of IOPS + throughput as first-class database-sizing parameters alongside vCPU / RAM / storage, and positions horizontal sharding as the architectural lever that makes I/O cost scale linearly rather than super-linearly as a database grows. Small-DB baseline ($1,649-$3,370/mo across RDS / Aurora / PlanetScale) vs 8× scale: unsharded RDS with
io1provisioned-IOPS = $20,520-$24,197/mo (11-13× cliff); 8-shard PlanetScale = $13,992/mo (linear 8×). Load-bearing claim: "Sharding is an excellent technique to run huge databases efficiently, without needing to pay an EBS premium. … our IO and throughput requirements are distributed across these instances, allowing each to use a more affordablegp2orgp3EBS volume." Opens with a post-publication Note that Metal (March 2025, ~7 months after the original publication) offers the substrate-level alternative — "Metal databases give you unlimited IOPS and ultra low latency reads and writes" — so the IOPS-cost-cliff the article sells sharding as the mitigation for is, on Metal, structurally mitigated by local-NVMe substrate. Five canonical new pages: source + 4 concepts (IOPS, throughput vs IOPS, sequential vs random I/O on SSD/EBS, EBS IOPS burst bucket, linear vs super-linear cost scaling) + 1 pattern (sharding as IOPS scaling). Extends systems/aws-ebs, systems/aws-rds, systems/mysql, systems/vitess, systems/planetscale-metal, concepts/iops-throttle-network-storage, concepts/horizontal-sharding, concepts/write-throughput-ceiling, patterns/exhaust-simpler-scaling-first. Ben Dicken's Nth PlanetScale first-party ingest (canonical database-internals educator voice; 2024-era). Tier-3 on-scope — default-include per PlanetScale skip rules; architecture density ~85% of body (pedagogy voice + vendor cost-comparison angle). Caveats: vendor-marketing angle, no measured per-shard IOPS distribution, hot-shard scenario not quantified, no benchmark data ("Performance benchmarks are not included here" — explicit in body), sharding-overhead (VTGate latency tax + cross-shard queries) elided, August-2024 pricing snapshot. -
2024-03-29 — Identifying and profiling problematic MySQL queries (Ben Dicken, originally 2024-03-29, re-fetched 2026-04-21) — **Canonical wiki field manual for native MySQL query diagnosis via
performance_schema -
sysschema + stage-timing profiling. Two-phase workflow canonicalised: (1) identify via digest-based query prioritization — sortevents_statements_summary_by_digestorsys.statements_with_runtimes_in_95th_percentile/sys.statements_with_full_table_scansby the axis matching your fix (SUM_TIMER_WAITfor cumulative burn;AVG_TIMER_WAITfor per-execution cost;rows_examined_avgfor missing-index candidates); (2) drill via stage-timing profiling (three-toggle + thread-id + event-id-bracket workflow) +EXPLAIN ANALYZE. Per-table complement:table_io_waits_summary_by_index_usagesurfaces the unindexed-read-vs-indexed-read ratio. Production-relevance datum: 2.57 × 10⁹ unindexed reads vs 164,500 indexed ongame.message= a "this table needs an index" signal at scale. Stage-timing worked datum: 735.3 ms instage/sql/executingout of ~736 ms total — common execution-bound shape. Positions PlanetScale Insights as the productised UX over the same digest data: "gleaning this information can be tedious. Getting exactly what you want requires significant poking around… Many of these same observations can be gathered much easier using PlanetScale Insights." Ben Dicken's sixth+ wiki ingest (after B-trees, slotted-counter-adjacent appearances, IO devices + latency, EBS reliability, caching, Postgres 17-vs-18 benchmarks, faster backups with sharding, dealing with large tables, graceful degradation — one of the wiki's most canonical PlanetScale database-internals educator voices). Wiki pages created (7): this source + 4 concepts (concepts/mysql-performance-schema, concepts/mysql-sys-schema, concepts/query-digest, concepts/query-stage-profiling) + 3 patterns (patterns/digest-based-query-prioritization, patterns/stage-level-query-profiling, patterns/index-usage-per-table-diagnostic). Wiki pages extended (4)**: concepts/mysql-explain-analyze, systems/mysql, systems/planetscale-insights, concepts/observability. -
2026-03-31 — Graceful degradation in Postgres (Ben Dicken, originally 2026-03-31, re-fetched 2026-04-21) — canonical graceful- degradation framing of Traffic Control, reframing the mechanism from the mixed-workload contention axis (2026-04-11 queue-health post) to the survive-a-spike-with-partial-degradation axis. Canonical social-media worked example: partition traffic into critical (auth, post creation, post fetch, profiles) / important (comments, search, DMs) / best-effort (like+impression+bookmark counts, trending, notifications, analytics) tiers — new query priority classification three-tier scheme canonicalised. Tagged at the query layer via SQLCommenter metadata ("
category='viewPost', priority='critical'") — first canonical wiki page for SQLCommenter as a standalone primitive, previously referenced implicitly across Insights/Traffic-Control/query-tag-filter pages. Three-tier budget recipe verbatim from the post: critical = no server-share cap, no burst, 2-sec per-query max; important = 25% server share + moderate concurrency; best-effort = 20% server share + low concurrency + live-disable-able under spike. Canonical new warn mode vs enforce mode concept — "There's no need to get the tunings above perfect from day one. You can start every budget in warn mode. This will not kill any queries that exceed the budget. Rather, it will warn, and you can click into the budget to see how many queries are exceeding it over time." Four-stage budget-tuning lifecycle: comment → warn → monitor → enforce. Canonical wiki datum for the[PGINSIGHTS] Traffic Control:warning channel — over- budget events surfaced as messages returned inside the Postgres query response so applications can observe budget pressure "from within your application without any user- facing effects" — a diagnostic piggyback on the query wire protocol delivered at the extension tier. New patterns/shed-low-priority-under-load pattern: user-facing graceful-degradation-as-infrastructure — classify traffic by user-perceived priority, budget each class, live- disable the lowest class under spike. Under a 10× viral- event / bad-deploy / DDoS spike, operators "click into thebest-effort-budgetand completely disable this traffic. Changes to budgets happen live, so we would immediately see the impact of this." Architectural reframe: "What could have been a huge lost-opportunity (your app becomes unusable) is now only a temporary degradation of non-critical functionality. We've kept our users happy and avoided an application outage." Two orthogonal framings of the same Traffic Control mechanism on the wiki now: the 2026-04-11 Griggs queue-health post canonicalises the mixed-workload contention → MVCC-horizon-protection framing; this Dicken post canonicalises the user-priority → spike- survival framing. Wiki pages created (5): source + 3 concepts (concepts/query-priority-classification, concepts/warn-mode-vs-enforce-mode, concepts/sqlcommenter-query-tagging) + 1 pattern (patterns/shed-low-priority-under-load). Wiki pages extended: systems/planetscale-traffic-control (user- facing graceful-degradation framing + warn-mode lifecycle + PGINSIGHTS in-band warning channel), [[patterns/workload- class-resource-budget]] (second canonical instance — user- priority axis), concepts/graceful-degradation (canonical database-tier instance), systems/planetscale-insights (warn-mode observability surface + PGINSIGHTS piggyback), systems/planetscale-for-postgres (Traffic Control as vendor differentiator), systems/postgresql (Postgres as substrate for extension-layer load-shedding). Scope disposition: on-scope Tier-3 — Ben Dicken named voice (default-include per PlanetScale skip rules), architecture density ~90% of body (every section advances a Traffic- Control primitive or a worked tier/budget/warn-mode application). No production numbers (illustrative social- media scenario, not customer retrospective); threshold- picking heuristics thin (25% / 20% given as example without derivation); retry-storm avoidance not elaborated; PGINSIGHTS wire-protocol mechanics not fully specified (NoticeResponse? Extension-specific channel?). URL rule compliance: raw file'surl:field ishttps://planetscale.com/blog/graceful-degradation-in-postgres(matches filename slug); source pageurl:+ body## Sourcesection both use verbatim URL. -
2026-02-19 — Faster PlanetScale Postgres connections with Cloudflare Hyperdrive (Simeon Griggs, originally 2026-02-19, re-fetched 2026-04-21) — demo-app narrative walking through a real-time prediction-market on PlanetScale Postgres Metal + full Cloudflare edge stack (Workers + Hyperdrive + Durable Objects + WebSockets). Four load-bearing architectural calls made explicit: (1) DO not on write path — "Durable Objects are single-threaded and hosted in a single location, making them a bad candidate for the write path. Instead the Workers will send transactions to the database via the Hyperdrive connection"; the DO is broadcast-coordinator only (canonical patterns/single-region-do-fanout-from-distributed-writers). (2) Postgres is authoritative, WebSockets are fast — "decide what is authoritative and what is just fast. Postgres is the source of truth and WebSockets are the low-latency notification layer... updates are immediate most of the time, and eventually correct all of the time" — canonical wiki statement of concepts/authoritative-vs-fast-notification and the shape of patterns/db-authoritative-with-websocket-notify. (3) Smart placement deliberately NOT used — Workers stay user-adjacent because Hyperdrive already closes the edge-to-origin DB latency gap — the wiki's canonical "when NOT to use explicit placement hint" datapoint. (4) Stale-quote rejection at DB level — each option-purchase request carries "expected price/version data and slippage tolerance, so the backend can reject stale quotes while successful writes immediately broadcast over WebSockets" — canonical concepts/stale-quote-rejection. Architecture decomposition datapoints: Hyperdrive = two-component (edge-global TLS/handshake pre-negotiator for "the 7 round-trip steps of creating a connection" + origin-co-located warm connection pool); write flow = Worker → Hyperdrive → Postgres → Worker pings DO over WebSocket → DO fans out to all connected browsers; multi-environment pattern =
wrangler.jsoncenv.development.hyperdriveblock +CLOUDFLARE_ENV=development+ per-branch PlanetScale DB. Production hardening deliberately deferred with named work items: replay-on-reconnect (cursor-based event replay), queue-backed fanout (Cloudflare Queues between write commit + broadcast), polling reconciliation (periodic DB poll to catch missed WebSocket pushes). Single-DO scaling ceiling flagged with named mitigation ("scale Durable Objects horizontally — sharded with a key" — e.g. per-market DO). Post also positions Query Insights + PlanetScale MCP server + database-skills.com agent skills as the correctness-discipline loop for LLM-authored query code ("I'm anticipating that you're building by describing what you want to an LLM"). Operational numbers: $50/month smallest Metal cluster used for demo; $5/month non-Metal tier viable; 330+ Cloudflare POPs as network positioning; "~50ms of 95% of the world's internet-connected population" Cloudflare-network positioning line; 7 TLS/handshake round-trip steps pre-negotiated by Hyperdrive's edge tier. Three new wiki entities: systems/cloudflare-websockets, concepts/authoritative-vs-fast-notification, concepts/stale-quote-rejection, patterns/db-authoritative-with-websocket-notify, patterns/single-region-do-fanout-from-distributed-writers. Extends existing systems/hyperdrive with the two-component architecture disclosure, systems/cloudflare-workers + systems/cloudflare-durable-objects with the "DO-not-on-write-path" datapoint, systems/planetscale-metal + systems/planetscale-for-postgres with the customer-facing real-time positioning, concepts/edge-to-origin-database-latency with the "when NOT to place" datapoint, patterns/explicit-placement-hint + patterns/partner-managed-service-as-native-binding with additional instances. -
2024-07-30 — Faster backups with sharding (Ben Dicken, originally 2024-07-30, re-fetched 2026-04-21) — canonical wiki disclosure of PlanetScale's production backup architecture and the shard-parallel backup property. Seven-step per-shard choreography on an ephemeral VTBackup instance spun up by PlanetScale Singularity: (1) internal API initiates, (2) Singularity spins up fresh compute, (3) VTBackup restores previous backup from S3/GCS via
builtinengine (decrypted on arrival), (4) spins up MySQL on restored data, (5) connects to primary VTGate, requests checkpoint-in-time, replicates catchup delta, (6) stops catchup MySQL, (7) takes new full backup to S3/GCS. Measured production scaling: unsharded 161 GB = 30 min 40 s (~176 MB/s); sharded 20 TB / 32 shards = 1 h 39 min 4 s (~6.7 GB/s aggregate, ~210 MB/s per shard — ~38× faster than naïve extrapolation's 63 h); sharded ~230 TB / 256 shards = 3 h 37 min 11 s (~35 GB/s aggregate, ~137 MB/s per shard). Per-shard throughput approximately constant; aggregate scales ~linearly with shard count. Primary-as-replication-source for catchup is a named production choice (Source: concepts/primary-vs-replica-as-replication-source) — justified by: primary is already replicating to 2 replicas, catchup delta is small post-first-backup (12–24 h of binlog events, not full DB). Mitigation: "backups can be scheduled to happen during lower traffic hours." Restore inherits the same shard- parallel property — "the restoration of a massive database [takes] mere hours rather than days or weeks." Non-obvious load-bearing backup roles: (a) new- replica seeding after primary failover — the new replica restores from backup then catches up, rather than re-replicating the full DB from the primary; (b) PITR via Vitess native feature; (c) accidental-deletion recovery via backup restored to dev branch for cherry-pick — canonical production case study (Dub customer): hard-delete + shard-parallel backup = backup-as-escape-hatch (Source: concepts/soft-delete-vs-hard-delete). Backups are encrypted at rest (Source: concepts/backup-encryption-at-rest). Scope disposition: on-scope Tier-3 — Ben Dicken pedagogical-voice architectural-density post. Per companies/planetscale skip rules: Dicken database- internals posts default-include. Architecture density ~90% of body (7-step choreography + 3 measured production instances + 4 operational-uses taxonomy + primary-vs-replica trade-off). Canonical new wiki pages (7): [[sources/2026-04-21-planetscale-faster- backups-with-sharding]], 3 systems (systems/vtbackup, [[systems/planetscale- singularity]], systems/google-cloud-storage stub), 2 patterns ([[patterns/dedicated-backup-instance-with- catchup-replication]], [[patterns/shard-parallel- backup-and-restore]]), 6 concepts (concepts/shard-parallel-backup, concepts/primary-vs-replica-as-replication-source, concepts/point-in-time-recovery, concepts/replica-creation-from-backup, concepts/backup-encryption-at-rest, concepts/soft-delete-vs-hard-delete). Extends: systems/vitess (tenth canonical Vitess-internals disclosure — backup / restore operational-primitive axis), systems/vitess-mysqlshell-backup (framed as logical alternative to PlanetScale'sbuiltinproduction default), systems/planetscale, systems/planetscale-metal (Metal inherits same backup architecture), [[patterns/snapshot-plus- catchup-replication]] (new backup-to-object-storage composition at object-storage altitude), concepts/logical-vs-physical-backup (production datum: PlanetScale usesbuiltinphysical). -
2026-04-21 — Consensus algorithms at scale: Part 8 — Closing thoughts (Sugu Sougoumarane, originally 2022-07-07, re-fetched 2026-04-21) — capstone essay closing the 8-part Consensus algorithms at scale series. Consolidates Parts 1–7 into two architectural recommendations: [[patterns/pluggable- durability-rules|pluggable durability rules]] (durability as a plugin over node-set predicates; FlexPaxos cited as the theoretical basis, "majority quorum is just a special case of intersecting quorums"; topology-elasticity as the structural consequence) + [[patterns/lock-based-over- lock-free-at-scale|lock-based over lock-free at scale]] (four-advantage argument: graceful demotion, node membership coordination, direct-to-leader consistent reads, anti-flapping). Closes Part 5's forward-reference on preferring lock-based for large-scale systems. Rejects the framing that Paxos and Raft are conceptually foundational ("They are foundational from a historical perspective, but they are not conceptually foundational"). Vitess as canonical worked composition — four-way mapping from lock-based advantages to Vitess features (VTOrc durability plugin; Vitess Operator graceful failover; direct-to- leader reads; inherited Orchestrator anti-flapping). VTOrc full-auto-pilot roadmap disclosed. Out-of-scope topics named: failure detection, consistent reads, node membership changes. Intellectual humility: "It is possible that consensus could be generalized using a different set of rules. But I personally find the approach presented in this series to be the easiest to reason about." Wiki pages created (3): source + 2 patterns (patterns/pluggable-durability-rules, [[patterns/lock- based-over-lock-free-at-scale]]). Wiki pages extended (10): systems/vitess (ninth canonical Vitess-internals disclosure + capstone-architectural-recommendations axis), systems/vtorc (durability plugin + full-auto-pilot sections), systems/orchestrator, systems/mysql, systems/planetscale, concepts/anti-flapping, concepts/request-propagation, concepts/revoke-and-establish-split, concepts/leader- revocation, concepts/leader-establishment, concepts/elector, concepts/no-distributed-consensus. Twenty-fifth PlanetScale first-party ingest; fourth canonical consensus-series instalment on the wiki after Parts 4, 5, 6, 7.
-
2026-04-21 — Consensus algorithms at scale: Part 7 — Propagating requests (Sugu Sougoumarane, originally 2022-07-01, re-fetched 2026-04-21) — canonical wiki disclosure of the final load-bearing concern of Sugu's consensus framework: [[concepts/request- propagation|request propagation]]. "We have saved the most difficult part for last." Two regimes: the planned- change regime (canonical new [[patterns/graceful- propagation-before-demotion]] pattern extending Part-4's graceful-demotion with a propagation-completion invariant)
- the failure regime (the elector indirectly revokes by fencing enough followers; the same operation simultaneously discovers all previously-completed requests). Seven canonical failure modes of propagation enumerated — each a distinct interaction between elector knowledge (complete vs partial discovery) and request lifecycle (incomplete vs tentative vs durable). Resolution rule: canonical new [[patterns/version-per-request-to-resolve- conflicts]] — attach a time-based version to every request; later versions supersede earlier; propagation assigns a new version. Paxos proposal numbers + Raft term numbers are canonical implementations, generalised to a single concept at the request layer (one level below the election layer where Part 5 canonicalised the same primitive). Production shortcut: canonical new concepts/anti-flapping concept — rate-limiting rule on leadership changes. Primary purpose: failure-loop avoidance ("leadership change usually due to a deeper underlying problem"); serendipitous second-order effect: makes per-request versioning operationally optional. Canonical production instance: MySQL binlog's faithful-propagation of GTID + timestamp metadata formally violates the new-version-on-propagation rule, but Orchestrator's built-in anti- flapping rules compensate operationally. Vitess's VTOrc is a customised fork of Orchestrator — architectural-lineage disclosure not made in prior series instalments. Canonical new patterns/external-metadata-for-conflict-resolution pattern. Wiki pages created (9): source + 4 concepts (concepts/request-propagation, [[concepts/request- versioning]], concepts/incomplete-request, [[concepts/ anti-flapping]]) + 3 patterns ([[patterns/version-per- request-to-resolve-conflicts]], [[patterns/external- metadata-for-conflict-resolution]], [[patterns/graceful- propagation-before-demotion]]) + 2 systems ([[systems/ orchestrator]], systems/vtorc — VTOrc previously referenced from Part 5 + elector pages but no dedicated page until now, resolving a dangling-link). Wiki pages extended: systems/vitess (Part-7 Seen-in + frontmatter
-
new patterns + new concepts + new systems), systems/ mysql (Part-7 Seen-in framing GTID + timestamp as external-metadata pattern substrate + frontmatter extended), concepts/proposal-number (Part-7 Seen-in canonicalising proposal-number-at-request-layer framing + Part-5's forward-reference closed), concepts/gtid-position (Part-7 Seen-in canonicalising dual role data-motion vs consensus-versioning), concepts/revoke-and-establish-split (Part-7 Seen-in + removes prior "not yet ingested" flag on Part 7), concepts/elector (Part-7 Seen-in on VTOrc-is- Orchestrator-fork lineage + elector-capability invariant), concepts/split-brain (Part-7 Seen-in on anti-flapping-as- split-brain-mitigation at MySQL scale). Scope: on-scope Tier-3 — Sugu Sougoumarane Vitess-architect theoretical voice, same series as Part 4, Part 5, Part 6 (all ingested same session). Per companies/planetscale.md skip rules: Sugu architectural voice default-include. Cross-source continuity: eighth canonical Vitess- internals disclosure on the wiki; extends Part 5's proposal-number primitive to the request layer; extends Part 4's graceful-demotion with propagation-completion invariant. No existing-claim contradictions — strictly additive. Caveats: no production numbers; no anti- flapping window guidance in time units; VTOrc "tightened corner cases" stated as intent not disclosure; Orchestrator architecture named not walked-through. URL rule compliance: raw file's
url:field ishttps://planetscale.com/blog/consensus-algorithms-at- scale-part-7; source page + body## Sourcesection both use verbatim URL. Return contract: ingested: wiki/sources/ 2026-04-21-planetscale-consensus-algorithms-at-scale-part- 7-propagating-requests.md — 17 wiki pages touched. -
2026-04-21 — Consensus algorithms at scale: Part 6 — Completing requests (Sugu Sougoumarane, originally 2022-06-21, re-fetched 2026-04-21) — twenty-fourth PlanetScale first-party ingest and the third canonical wiki disclosure in the consensus-algorithms series (after Parts 4 and 5). Sugu derives the per-request commit-path protocol: the leader first transmits each request as tentative, collects durability acks to reach the implicit durable stage, then sends complete messages that cause followers to materialise the effect. The three-stage lifecycle is canonicalised on the wiki along with its load-bearing safety invariant: "Completion and cancellation are mutually exclusive. A request that was completed will never be canceled, and a request that was canceled will never be completed." Cross- instalment payoff: the lock-based election + leader lease recommended in Part 5 composes with the early-ack-on-durability commit path from Part 6 to give both cheap writes and cheap leader-local consistent reads. MySQL semi-sync critique canonicalised on the wiki for the first time: MySQL semi-sync lacks the two-phase shape (replicas apply-on-receive; no tentative state) and a restarting primary completes in-flight work without re-checking durability — both produce semi-sync split-brain, the specific production hazard Vitess-on-MySQL manages operationally via PRS / ERS + vttablet lameduck + vtgate query buffering. Optimisation named: once a request is durable, the leader may skip the
tentativestep for lagging followers and sendcompletedirectly (patterns/skip-completion-for-late-followers). Wiki pages created (9): source + 6 concepts (concepts/two-phase-completion-protocol, concepts/tentative-request, concepts/durable-request, concepts/request-cancellation, concepts/mysql-semi-sync-split-brain, concepts/quorum-read) + 3 patterns (patterns/two-phase-tentative-then-complete, patterns/skip-completion-for-late-followers, patterns/early-ack-on-durability). Wiki pages extended (6): concepts/consistent-read (commit-path-vs-read-path coupling), concepts/leader-lease (early-ack composition), concepts/forward-progress (retry-not-cancel commit-path sub-invariant), concepts/split-brain (semi-sync commit-path instance), systems/mysql (semi-sync hazard Seen-in), systems/vitess (commit-path framing Seen-in). Scope disposition: on-scope Tier-3 → clears. Architecture content 100% of body (pedagogy series, no marketing). Publication- date ambiguity: frontmatterpublished:is 2026-04-21 (re- fetch); body byline is June 21, 2022 — architectural content still current. Next in series: Part 7 (Propagating requests) will tie the Part 4 revoke/establish + Part 5 lock resolution + Part 6 two-phase completion together at the leadership-change altitude. -
2026-04-21 — Consensus algorithms at scale: Part 4 — Establishment and revocation (Sugu Sougoumarane, originally 2022-04-06, re-fetched 2026-04-21) — twenty-second PlanetScale first-party ingest and first canonical wiki disclosure of Vitess's leader-election operational primitives (
PlannedReparentShard/EmergencyReparentShard) and the algorithmic framing (revoke-then-establish) behind them. Sugu Sougoumarane — Vitess co-creator, PlanetScale co-founder — publishes Part 4 of a 5-part consensus-algorithms-at-scale series. Load-bearing claim: every leader-based consensus algorithm performs two distinct actions when electing a new leader — revoke the previous leadership and establish the new one. Traditional majority-quorum algorithms (Paxos, Raft) conflate these into a single atomic action (a successful proposal-number push to a majority simultaneously revokes the old leader and establishes the new one), and this conflation hides the fact that the two concerns could be separated. Once you separate them, you can use different mechanisms for each step, optimise them independently, and accommodate practical scenarios where majority-quorum-as-single-action doesn't fit. Worked production instance: Vitess's two reparent operations —PlannedReparentShard(PRS) for software-rollout-class planned changes andEmergencyReparentShard(ERS) for crash / network-partition-class unplanned failover — expose the separation at the operational-primitive level. PRS uses the graceful-demotion path: ask the current leader to step down → in-flight transactions complete under a vttablet-level lameduck → new transactions buffered at the vtgate proxy tier → once PRS completes the buffered transactions flush to the new primary, and the application sees no errors. ERS uses the fence-the-followers path: the old leader is unreachable so revocation is achieved by telling the followers to stop accepting its writes, then establishing a new leader. Design principle canonicalised: "It is important that we optimise for the common case." Software rollouts are daily; crashes happen monthly or less; PRS-gracefully-is-the-common-path is structurally correct because failure-mode frequencies make the optimisation axis obvious. Canonicalising contribution: (a) 3 new concept pages — concepts/leader-revocation, concepts/leader-establishment, concepts/lameduck-mode; (b) 2 new pattern pages — patterns/separate-revoke-from-establish (the higher-level algorithm-design principle) and patterns/graceful-leader-demotion (the planned-transition shape composing lameduck + query buffering + step-down); (c) extends concepts/query-buffering-cutover from being the migration-cutover primitive to the general application-transparency primitive under graceful leader demotion; (d) provides the algorithmic foundation for the existing patterns/zero-downtime-reparent-on-degradation pattern (PlanetScale EBS-failure-rate post) — the pattern's "seconds" impact window depends on the two-tier PRS mechanism canonicalised here. Seventh canonical Vitess-internals disclosure on the wiki after (expression evaluation), (data motion), (cross-shard writes), Throttler trilogy (load admission), (release notes), and (public CDC) — this post fills the leader election / reparenting axis of Vitess's control plane. Introduces Sugu Sougoumarane as the Vitess-architecture-authorial voice on the wiki (co-creator of Vitess and PlanetScale co-founder), complementing the existing Vitess-maintainer roster (Martí / Gangal / Sigireddi / Noach / Gupta / Lord / Guevara / Raju). Scope disposition: on-scope Tier-3, theoretical / conceptual voice (not a production retrospective) — no PRS latency numbers, no lameduck drain-duration distributions, no ERS success-rate telemetry, no vtgate buffer-depth statistics. Architecture density ~95% of body (every paragraph advances the revoke / establish separation argument + its Vitess instantiation). Part 5 (race handling, forward progress) forthcoming in the series; not yet on the wiki. Pages created (6): sources/2026-04-21-planetscale-consensus-algorithms-at-scale-part-4-establishment-and-revocation; 3 concepts (concepts/leader-revocation, concepts/leader-establishment, concepts/lameduck-mode); 2 patterns (patterns/separate-revoke-from-establish, patterns/graceful-leader-demotion). Pages extended: systems/vitess (new Seen-in entry + frontmatter; seventh canonical Vitess-internals disclosure framing); systems/mysql (frontmatter — semi-sync replication as revocation primitive); systems/planetscale (frontmatter); concepts/query-buffering-cutover (new Seen-in entry generalising the primitive from migration cutover to reparent cutover); concepts/no-distributed-consensus (new Seen-in entry framing this as the structural contrast — yes-consensus with explicit separation vs no-consensus with gossip + CRDT); patterns/zero-downtime-reparent-on-degradation (new Seen-in entry framing this source as the algorithmic foundation for why the pattern works). -
2026-04-21 — Benchmarking Postgres (Ben Dicken, originally 2025-07-01 per page timestamp, re-fetched 2026-04-21) — nineteenth PlanetScale first-party ingest and canonical wiki disclosure of PlanetScale's benchmarking methodology. Methodology / disclosure-voice post accompanying the 2025-07-01 PlanetScale for Postgres launch's "consistently outperform every Postgres product on the market" claim. Six architectural axes: (1) Canonicalises Telescope as PlanetScale's internal benchmarking harness — "We built an internal tool, 'Telescope', to be our go-to tool for creating, running, and assessing benchmarks." First canonical wiki instance of patterns/custom-benchmarking-harness at the multi- vendor-comparative-OLTP-benchmarking altitude. (2) Three benchmarks, three questions: latency (
SELECT 1;), TPCC (Perconasysbench-tpccat 500 GB,TABLES=20, SCALE=250), OLTP read-only (sysbench oltp_read_onlyat 300 GB). (3) Reference target:i8g M-320(4 vCPU, 32 GB, 937 GB NVMe) with primary + 2 replicas across 3 AZs. (4) RAM:CPU-asymmetry rule: match RAM first, let CPU asymmetry cut in the vendor's favour when ratios differ — "we opted to match the RAM, giving them double the CPU count used by PlanetScale. This is an unfair advantage to them". (5) Availability-posture equalisation: competitor cost models include replicas to match PlanetScale's default 3-AZ posture. Canonical wiki instance of the new concepts/price-performance-ratio concept. (6) Reproducibility commitment: full configs, reproduction instructions at/benchmarks/instructions/tpcc500gand/benchmarks/instructions/oltp300g, feedback addressbenchmarks@planetscale.com. Methodology-voice acknowledgment of bias caps: AZ-placement not controllable across vendors; RAM:CPU asymmetry called out explicitly. Canonical wiki pages created (4): sources/2026-04-21-planetscale-benchmarking-postgres, systems/telescope-planetscale, concepts/price-performance-ratio, patterns/reproducible-benchmark-publication. Scope: on-scope Tier-3 — Ben Dicken database-vendor- methodology voice, architecture density ~80% of body (methodology + configs + accounting rules + bias acknowledgment), clears Tier-3 bar per "Dicken database-internals / Lambert ex-GitHub war-story posts — default include" skip-rule. The post is pedagogy-meets- methodology-disclosure, not a raw results post — companion to the 2025-10-14 Postgres 17 vs 18 post which supplies the actual comparative numbers. -
2026-04-21 — Behind the scenes: How schema reverts work (Holly Guevara + Shlomi Noach, originally 2022-03-24, re-fetched 2026-04-21) — eighteenth PlanetScale first-party ingest and canonical wiki walkthrough of PlanetScale's instant-schema-revert mechanism. The post is the complementary "emergency escape hatch" to the earlier expand-migrate-contract "discipline" framing: the disciplined approach avoids needing a revert; PlanetScale's mechanism rescues the team that didn't follow it. Architectural load-bearing claim: VReplication "does not terminate upon migration completion" — uniquely among online-DDL tools. At cut-over the shadow table + replication stream stay alive, the stream is re-primed in the inverse direction (new → old) so the old-schema table keeps up with every post-cut-over write, and a revert is a second freeze- point swap of two already-in-sync tables — not a data copy, not a backup restore. Canonical new concepts concepts/shadow-table, concepts/cutover-freeze-point, and concepts/pre-staged-inverse-replication and patterns patterns/shadow-table-online-schema-change
-
patterns/instant-schema-revert-via-inverse-replication originate here. Four-step online-DDL shape canonicalised: build empty shadow with new schema, apply DDL to shadow, backfill + track concurrent writes through binlog, cut over under a brief write lock — shared by
pt-online-schema-change,gh-ost, and Vitess. Five VReplication design properties named as distinguishing factors: copy-and-changelog progress both tracked (not just backfill); per-transaction GTID mapping; GTID-driven interleaving; transactional sidecar-state coupling; non-termination after cut-over. Walked example (ALTER TABLE users DROP COLUMN title) makes the column-projection asymmetry explicit: post-cut-over rows ("Savannah") survive the revert but reappear with NULL in the restored column — "expected and something you can clean up after the revert, if necessary." Cross-source continuity: pairs with the 2026-02-16 zero-downtime-migrations post (same VReplication substrate, data-motion scale) and the patterns/reverse-replication-for-rollback pattern from it — one architectural principle ("keep the inverse replication alive past cut-over so nothing is a one-way door") at two scales (data-motion cutover + online DDL). Introduces Holly Guevara as a new named PlanetScale voice (joining Dicken, Lambert, Van Wiggeren, Martí, Lord, Gangal, Sigireddi, Hazen, Noach, Barnett, Gupta, Raju). Wiki pages created (6): source + 3 concepts + 2 patterns. Wiki pages extended (10): systems/vitess-vreplication, systems/vitess (eighth Vitess-internals axis: online-DDL + schema revert), systems/mysql, systems/planetscale, concepts/online-ddl, concepts/gtid-position, concepts/consistent-non-locking-snapshot, concepts/binlog-replication, patterns/snapshot-plus-catchup-replication, this companies/planetscale. Scope disposition: on-scope Tier-3 → clears. Architecture content is90% of the body; zero marketing content; retrospective-product post with deep internals. URL rule + H1 + wikilink compliance all observed.
-
2026-04-21 — Anatomy of a Throttler, part 3 (Shlomi Noach) — fourteenth PlanetScale first-party ingest and closing instalment of the three-part Vitess-throttler series. Part 3 moves from the deployment-topology + self-cost axis of part 2 to the client-side axis: who is asking the throttler, why it matters, and how to differentiate between them. Six canonical new wiki framings: (1) client identity — Vitess's canonical 4-level hierarchical identifier
<uuid>:<flow>:<subsystem>:<job-category>(e.g.d666bbfc_169e_11ef_b0b3_0a43f95f28a3:vcopier:vreplication:online-ddl) supporting both specific (this one job) and categorical (all online-DDL jobs) rules, plus observability slicing; (2) client starvation — failure mode where one unthrottled or low-rejection client pins the metric above threshold, continuously rejecting every other client for minutes to hours; three structural causes (rogue clients, exemption, differential metrics); (3) exemption — the risky-in-general prioritisation lever and its three justified cases (short transient starvation, incident- fix tasks, essential system components); (4) probabilistic rejection prioritisation — per-client rejection ratio applied as a dice roll independent of the metric check; prioritises without exemption because "both clients still play by the rules: none is given permission to act if the database has unhealthy metrics. It's just that one sometimes doesn't even get the chance to check those metrics"; (5) de-prioritise all except target — the operationally simpler dual framing (high global rejection ratio + zero on favoured identity) equivalent to per-client ratios but reducing rule-count; (6) time-bounded throttler rule — every exemption / prioritisation / de-prioritisation carries a TTL so incident-response and rush-hour adjustments can't become permanent stale policy. Rogue-client structural hole of the cooperative model — "A rogue client might neglect to connect to the throttler and just go ahead and send some massive workload" — closed only by a barrier / proxy-shaped throttler, instantiated by the new Vitess transaction throttler that sits in VTTablet's query-execution path and actively delays database queries under degradation; "Clients cannot bypass the throttler, and may not even be aware of its existence." Enforcement cost: client identification becomes an inference problem on SQL comments / connection attributes / session variables / auth scope rather than self-reported identity. The cooperative tablet throttler (parts 1-2) and the transaction throttler (part 3) coexist in production — cooperative for internal batch subsystems where identity is free, enforcement for OLTP application traffic where cooperation can't be assumed. Explicit differential-metrics-as-exemption insight: "While the second client throttles based on load average, the first client is effectively exempted from checking load average" — per-client metric sets are exemption in disguise and have the same starvation risk. Closing prescriptive claim: "Dynamic control of the throttler is absolutely critical, and the ability to prioritize or push back specific requests or jobs is essential in production systems." New pages (8): source -
3 concepts (concepts/throttler-client-identity, concepts/throttler-client-starvation, concepts/throttler-exemption) + 4 patterns (patterns/probabilistic-rejection-prioritization, patterns/deprioritize-all-except-target, patterns/time-bounded-throttler-rule, patterns/enforcement-throttler-proxy) + 1 system (systems/vitess-transaction-throttler). Extends systems/vitess-throttler (new client-identity + prioritisation-levers + relationship-to-transaction- throttler sections; Seen-in + Related + frontmatter extended) and concepts/database-throttler (new part-3 Seen-in entry + frontmatter extended with enforcement / client-identity / exemption axes). Tier-3 scope: on-scope uncontested — Vitess-internals content by a Vitess core maintainer; architectural density ~100% of body; no "Introducing" / "Announcing" / pricing content. Caveats: no production tuning numbers (specific rejection ratios, time-window defaults, rule-TTL policies not disclosed); no starvation-detection mechanism named; transaction- throttler internals (control loop, threshold logic, queue management, VTTablet connection-pool integration) described abstractly only; client-identification inference mechanisms under enforcement named as a category but not enumerated (SQLCommenter / session variables / auth scope / query shape plausible but unspecified); probabilistic-rejection pattern conflates short-circuit-metric vs additional-reject-on-green implementations. Series closes without a production- retrospective instalment — numbers from a running PlanetScale fleet are a future-ingest surface.
-
2026-04-21 — Anatomy of a Throttler, part 2 (Shlomi Noach) — thirteenth PlanetScale first-party ingest and fourth canonical Vitess-internals axis on the wiki after expression evaluation (2025-04-05 Vicent Martí), data motion (2026-02-16 Matt Lord), and query routing + transactional writes (2026-04-21 Harshit Gangal + Deepthi Sigireddi). Shlomi Noach — Vitess maintainer, now at PlanetScale — opens the load- admission-control axis via a three-part series on the throttler. Part 2 covers deployment topology (singular vs distributed, per-AZ, active-passive, singular-plus-host-agent, fully-distributed per-
vttablet) and self-cost (fail-open vs fail-closed HA semantics, busy-loop avoidance, free-pass windows, throttler hibernation). The Vitess tablet throttler is the canonical working example throughout: one throttler pervttabletmapping 1:1 to a MySQL server; shard-primary's throttler aggregates every replica throttler's metrics for shard-scope queries (max replication lag); no cross-shard throttler communication so fan-out stays bounded. Canonical new wiki framings for: (1) fail-open vs fail-closed — the client-side decision when the throttler is unreachable; the pragmatic bounded-wait-then-proceed compromise; (2) host-scope vs shard-scope metrics — "This introduces the concept of a metric's scope, which can be an entire shard or a specific host"; different workloads consult different scopes against the same throttler hierarchy; (3) throttler hibernation — slow or stop metric collection + heartbeat injection during idle periods; re-ignite on first client request; first few checks reject on stale data; client retry is the load-bearing compensation; (4) replication-lag heartbeats — canonical wiki source for thept-heartbeat-style timestamp-injection technique that dominates MySQL lag measurement, and its binlog-volume cost ("It is not uncommon to see MySQL deployments where the total size of binary logs is larger than the actual data set"); (5) layered polling-interval staleness — agent-mediated throttler sees metrics up to 2 s stale (1 Hz agent -
1 Hz throttler) vs 1 s for direct access. New canonical patterns: patterns/singular-vs-distributed-throttler (the full topology design space); patterns/host-agent-metrics-api (per-host daemon exposes metrics over HTTP; consumer polls one API per host); patterns/throttler-per-shard-hierarchy (per-host + shard-primary rollup; Vitess tablet throttler as canonical instance); patterns/idle-state-throttler-hibernation (hibernate metric collection + heartbeat generation during idle; coordinated re-ignition across distributed peers on first-touch). Extends systems/vitess with a fourth non-storage subsystem disclosure; extends systems/mysql with canonical heartbeat + binlog-cost framing. Tier-3 scope: on-scope uncontested — Vitess-internals content by a Vitess core maintainer, architectural density ~100% of body; no "Introducing" / "Announcing" / pricing content. Caveats: no latency/throughput numbers for throttler itself; no comparative topology benchmarks; no PlanetScale-specific production numbers (vendor-educational tone); active-passive metric-collection-while-passive described but not evaluated; part 1 referenced but not ingested at this time; part 3 forward-referenced but not yet published. Client-retry is load-bearing throughout — free-pass windows after successful checks, cold-start windows after hibernation, bounded wait under throttler unavailable — canonical wiki framing of the throttler- client contract as an asymmetric cost share (throttler fails safe, client retries to compensate). Opens four new wiki pages (5 concepts + 4 patterns + 1 source) and extends 4 existing system/company pages.
-
2026-04-21 — AI-Powered Postgres index suggestions (Rafer Hazen) — twelfth PlanetScale first-party ingest and first canonical wiki treatment of LLM-generated database changes + hypothetical-index evaluation. PlanetScale Insights now ships an AI-powered index suggestion surface on Postgres databases: the feature "monitors your database workload and periodically suggests new indexes to improve query execution speed and overall database performance." Two architectural pillars: (1) workload-aware LLM prompting — Insights' per-query-pattern telemetry narrows the candidate query set via three index-candidate filter gates (rows-read:rows-returned ratio, ≥ 0.1% of aggregated workload runtime, minimum execution count) before the LLM sees anything, so the LLM is never asked the untrustworthy "does anything need to change?" question; schema is narrowed to just the tables referenced by surviving queries; the LLM is asked to propose
CREATE INDEXstatements and cite which queries each index is designed to help; (2) LLM-plus-planner validation — every candidate passes (a) a syntactic parse of theCREATE INDEXstatement and (b) a HypoPG +EXPLAINcomparison: register the candidate as a hypothetical index, re-cost each cited query, discard any candidate that doesn't produce a "substantial improvement" on at least one. Insights surfaces only the survivors, with the estimated query-cost reduction attached. Canonical wiki framing of two LLM failure modes: (i) LLMs almost always find an answer even when the right answer is nothing (mitigated by workload-telemetry pre-filter); (ii) LLMs produce plausibly-correct-but-wrong output (mitigated by planner-based validation). Establishes concepts/llm-generated-database-changes as the umbrella class of workflows this post is the canonical instance of. Strictly a Tier-3-clearing launch post per PlanetScale skip-rules — architectural content ~85% of body (validation pipeline + workload pre-filter + HypoPG integration + two-phase validation + prompt structure), not a pure pricing / feature-list announcement. No quantitative efficacy disclosed (no hit rate, acceptance rate, or workload-level before/after numbers) + no model/provider named + no adoption figures — but these are caveats on the measurement, not on the architectural substance. Adds one new system (systems/hypopg), three new concepts (concepts/hypothetical-index-evaluation, concepts/llm-generated-database-changes, concepts/index-candidate-filtering), and two new patterns (patterns/llm-plus-planner-validation, patterns/workload-aware-llm-prompting). Extends systems/planetscale-insights with AI index suggestions as a new Insights capability alongside query-performance telemetry and Traffic Control. Extends systems/postgresql withEXPLAIN+ HypoPG as validation-oracle framing. Extends concepts/secondary-index with canonical production instance of the "write-amplification must be earned back by read-side wins" discipline. -
2025-07-08 — Caching (Ben Dicken) — seventh PlanetScale first-party ingest and first canonical caching primer on the wiki covering the full stack from CPU cache to CDN to database buffer pool. Pedagogical deep-dive from Ben Dicken's database-internals voice on caching as "the most elegant, powerful, and pervasive innovation in computing" — the one idea appearing at every tier: pair a small amount of expensive fast storage with a large amount of cheap slow storage, keeping frequently-accessed data in the fast tier. Canonical wiki introduction of the foundational caching pattern across its full hierarchy of applications. Substance across six orthogonal axes: (1) hit- rate as load-bearing metric — formula + cache-size effect + economic intuition; (2) [[concepts/cpu-cache- hierarchy|CPU cache hierarchy]] canonicalised (L1 → L2 → L3 → RAM, size-vs-speed trade-off, "faster lookup means more cost or size limitations due to how physically close the data needs to be to the requester"); (3) temporal locality / recency bias canonicalised via X.com timeline worked example with the Karpathy tweet trendline ("after a few days, are rarely viewed") — first canonical wiki statement that "these websites store much of their content in 'slow' storage (like Amazon S3 or similar), but cache recent content in faster, in-memory stores (like CloudFront, Redis or Memcached)"; (4) spatial locality + spatial-prefetch-on-access canonicalised via the photo-album worked example ("when one photo is loaded, we can predict which ones we think they will want to see next, and prefetch those into the cache as well"); (5) CDN edge cache over central origin canonicalised with concrete latency stratification (east-coast ~10-20 ms, west-coast ~50-100 ms, across-the-world 250+ ms) — "we live on a big spinning rock 25,000 miles in circumference, and we are limited by 'physics' for how fast data can move from point A to B"; (6) four cache- replacement-policy pages introduced — FIFO, LRU, time-aware LRU, LFRU (dual-queue) — with Dicken's explicit ranking: FIFO is simplest but "isn't optimal for most caching scenarios because it doesn't consider usage patterns," LRU is "the industry standard for many caching systems … aligns well with temporal locality in real-world data access patterns." Closing database-caching section canonicalises Postgres's two-layer stack (
shared_bufferson top of OS page cache, canonical 25%-of-RAM rule) and positions the MySQL InnoDB buffer pool as the single-layer counterpart. Explicit acknowledgement of out-of-scope axes: "We completely avoided the subject of handling writes and updates in caching systems… We didn't address consistency issues, sharded caches" — wiki already has those on other source pages (concepts/write-through-cache, concepts/invalidation-based-cache, concepts/cache-ttl-staleness-dilemma). Borderline- case ingest that passes the Tier-3 bar decisively per skip rules (Dicken database-internals post = default include) despite its announcement-voice framing — architectural density ~90% of body, every section advances a reusable cache primitive with worked example + trade-off + generalisation. -
2025-03-13 — IO devices and latency (Ben Dicken) — pedagogical history of storage media (tape → HDD → SSD → NVMe → cloud-era network-attached) framed as a sequence of latency step-changes, published to announce PlanetScale Metal. Core numbers: RAM ~100 ns; local NVMe ~50 μs; network-attached SSD (EBS) ~250 μs — a 5× regression the cloud-database industry accepted for elasticity + durability. Metal inverts the trade: direct-attached NVMe + primary-and-two-replicas replication + frequent backups + automated node replacement restores local-NVMe latency while closing the durability gap that drove the industry to network-attached storage in the first place. Canonical wiki introduction of concepts/storage-latency-hierarchy, concepts/nand-flash-page-block-erasure (targets / blocks / pages with per-page read-write + per-block erase), concepts/ssd-parallelism-via-targets (dedicated lines per target, host data layout determines parallelism extraction), concepts/ssd-garbage-collection (dirty-page reclamation copies live pages + erases block — hidden tail-latency tax), concepts/network-attached-storage-latency-penalty (5× hop the cloud default pays), concepts/iops-throttle-network-storage (GP3 default 3,000 IOPS cap, GP2 burst bucket, direct NVMe uncapped), concepts/storage-replication-for-durability (1% × 1% × 1% = 0.0001% 3-replica math), and concepts/tape-storage-sequential-access (tape as archive tier — CERN's 400 PB + AWS Storage Gateway VTL). New system: systems/nvme-ssd (PCIe-native SSD interface) + systems/planetscale-metal (the product embodiment). New pattern: patterns/direct-attached-nvme-with-replication.
Key systems / related pages¶
- systems/planetscale — PlanetScale as a managed relational DB product (MySQL + Postgres).
- systems/planetscale-portals — regional read replicas (2022- era launch, Taylor Barnett): one writing region + N read-only regions each holding a full dataset close to the application tier. Canonical latency delta ~90 ms → ~3 ms per query when the replica region matches the app region.
- systems/planetscale-for-postgres — the Postgres product (private preview 2025-07-01, later GA); real Postgres v17 under a proprietary operator with a proprietary proxy layer + PgBouncer + query buffering + online imports from v13+.
- systems/planetscale-metal — direct-attached NVMe + replication product tier (March 2025); canonical wiki instance of patterns/direct-attached-nvme-with-replication; engine-agnostic at the cluster-shape layer (MySQL in March 2025 + Postgres from July 2025).
- systems/neki — PlanetScale's upcoming horizontal- sharding system for Postgres (neki.dev, waitlist-only); canonical wiki instance of patterns/architect-sharding-from-first-principles-per-engine — architected from first principles rather than porting Vitess, because "Vitess' achievements are enabled by leveraging MySQL's strengths and engineering around its weaknesses."
- systems/pgbouncer — open-source Postgres connection pooler integrated inside PlanetScale's proprietary proxy layer for PlanetScale for Postgres.
- systems/convex — reactive-database / BaaS named as launch-window production customer migrating to PlanetScale for Postgres.
- systems/nvme-ssd — storage medium underneath Metal; canonical NVMe page added by 2025-03-13 ingest.
- systems/vitess — sharding substrate under PlanetScale MySQL; composes with transactional SPFresh for sharded vector indexes and with Metal's direct-NVMe tablets. Explicitly NOT the Postgres sharding solution (see Neki).
- systems/vitess-evalengine — Vitess's SQL expression
evaluation engine in
vtgate; canonical wiki instance of a Go VM catching up with C++ via bytecode-less callback-slice design + static type specialization (2025-04-05 Martí post). - systems/innodb — MySQL storage engine anchored by Dicken's 2024 B-tree post + Lambert's 2022 slotted-counter post (InnoDB lock-contention diagnosis); also hosts PlanetScale's transactional SPFresh vector index (2024-10-22).
- systems/mysql / systems/postgresql — the two engines PlanetScale hosts.
- systems/aws-ebs — named counterweight to Metal; Dicken's 2025-03-13 post frames EBS's 250 μs round-trip + 3,000 IOPS default cap as the architectural problem Metal solves.
- systems/spann — hybrid tree + graph, SSD-resident ANN algorithm PlanetScale builds on.
- systems/spfresh — continuously-updatable extension of SPANN; PlanetScale extends it further with transactional semantics.
- systems/hnsw / systems/diskann — rejected structural alternatives to SPFresh, with PlanetScale's explicit reasoning on each.
- systems/github — deployment context for the slotted counter pattern (several PlanetScale founders, including Sam Lambert, previously worked at GitHub).
Key concepts introduced¶
- concepts/b-tree — foundational data structure.
- concepts/b-plus-tree — MySQL / InnoDB variant.
- concepts/clustered-index — InnoDB table = primary-key B+tree.
- concepts/secondary-index — separate B+tree keyed on column, valued on primary key.
- concepts/innodb-buffer-pool — in-memory page cache.
- concepts/uuid-primary-key-antipattern — why random PKs hurt clustered-index databases.
- concepts/disk-block-size-alignment — why B-tree nodes match disk block sizes.
- concepts/row-level-lock-contention — InnoDB
Xrecord lock serialisation on hot rows. - concepts/hot-row-problem — data-shape pattern behind lock-contention disasters.
- concepts/hnsw-index / concepts/diskann-index / concepts/spann-index / concepts/spfresh-index — the four ANN-index families named by the 2024-10-22 post, with explicit structural comparison.
- concepts/transactional-vector-index — new wiki architectural category canonicalised by PlanetScale's vector beta.
- concepts/incremental-vector-index — the update-shape requirement that disqualifies HNSW for OLTP-adjacent use.
- concepts/storage-latency-hierarchy — five-tier RAM→NVMe→EBS→HDD→tape framing from the 2025-03-13 Dicken post.
- concepts/nand-flash-page-block-erasure — target / block / page hierarchy with asymmetric read-write-erase semantics.
- concepts/ssd-parallelism-via-targets — dedicated-line- per-target parallelism + host-layout-dependent throughput.
- concepts/ssd-garbage-collection — firmware-level dirty- page reclamation + write-amplification + tail-latency tax.
- concepts/network-attached-storage-latency-penalty — 5× hop EBS-class storage pays vs local NVMe.
- concepts/iops-throttle-network-storage — GP3 3,000 IOPS default cap, GP2 burst bucket, direct NVMe uncapped.
- concepts/storage-replication-for-durability — 1% × 1% × 1% = 0.0001% independent-failure math behind 3-replica durability.
- concepts/tape-storage-sequential-access — historical / archive tier (CERN 400 PB, AWS VTL).
Key patterns introduced¶
- patterns/sequential-primary-key — canonical B+tree locality pattern.
- patterns/slotted-counter-pattern — split one hot
counter row into
Nslots; GitHub'sgithub.downloadsworkload is the canonical deployment. - patterns/vector-index-inside-storage-engine — put the ANN index inside the durable engine; PlanetScale transactional SPFresh inside InnoDB is the canonical wiki instance.
- patterns/hybrid-tree-graph-ann-index — the algorithmic family (SPANN + SPFresh) that makes the storage-engine- hosted shape viable.
- patterns/direct-attached-nvme-with-replication — primary
- 2 replicas on local NVMe + auto-failover + frequent backups; the Metal architecture in pattern form. Inverts the cloud-database default of network-attached storage.
- patterns/shared-nothing-storage-topology — each node owns its own storage; no shared fabric. The structural answer (2025-03-18 post) to EBS fleet-scale correlated-AZ-failure.
- patterns/automated-volume-health-monitoring — in-app heuristic monitor (read/write latency + idle % + synthetic write-file smoke test) that classifies volume degradation in seconds (2025-03-18 post). Customer-side mitigation while running on EBS.
- patterns/zero-downtime-reparent-on-degradation — trigger downstream of the health monitor: promote replica + fence primary + auto-provision replacement volume in seconds (2025-03-18 post). Clamps the impact window.
- patterns/callback-slice-vm-go — Vitess evalengine's
Go-specific bytecode-less VM design: compile each
instruction into a closure pushed onto a
[]func(*VM) intslice; the VM loop is a single indirect call per opcode (2025-04-05 post). Canonical wiki instance of a high-performance Go interpreter. - patterns/static-type-specialized-bytecode — the compile-time specialization that makes the Vitess VM allocate zero memory on most opcodes; static types derived from MySQL's information schema by the Vitess semantic analyzer (2025-04-05 post).
- patterns/vm-ast-dual-interpreter-fallback — retain
the AST interpreter permanently alongside the VM as
deoptimization fallback for value-dependent type
promotions (canonical case:
-BIGINT_MIN → DECIMAL) + one-shot evaluator for constant folding (2025-04-05 post). - patterns/fuzz-ast-vs-vm-oracle — differential-fuzz both interpreters against each other + against MySQL's C++ reference; surfaces bugs in MySQL itself that Vitess upstreams (2025-04-05 post).
- patterns/architect-sharding-from-first-principles-per-engine — canonical new wiki pattern introduced by the 2025-07-01 Postgres launch. PlanetScale's stance that sharding layers are engine-specific by construction and not portable across engines: "Vitess' achievements are enabled by leveraging MySQL's strengths and engineering around its weaknesses. To achieve Vitess' power for Postgres we are architecting from first principles." Canonical wiki instance: Neki.
Key concepts introduced (continued — 2025-03-18 EBS reliability post)¶
- concepts/partial-failure — canonical wiki concept; PlanetScale's cache-miss thundering-herd worked example.
- concepts/slow-is-failure — customer-centric framing: for OLTP, a 200–500 ms/op latency spike is indistinguishable from a full outage.
- concepts/performance-variance-degradation — the gp3-SLO-as-variance-floor framing (14 min/day, 86 h/year of potential degraded operation).
- concepts/blast-radius-multiplier-at-fleet-scale — 768 gp3 volumes → 99.65% active-event probability at any given moment.
- concepts/correlated-ebs-failure — fabric-level correlated AZ failure, observed "even on io2".
Key concepts introduced (continued — 2025-04-05 interpreter post)¶
- concepts/bytecode-virtual-machine — the interpreter family the Vitess evalengine VM lives in.
- concepts/ast-interpreter — predecessor and permanent fallback interpreter.
- concepts/jit-compilation — the rejected next step, with the wiki's first dispatch-overhead-share threshold (<20% → don't JIT).
- concepts/static-type-specialization — eliminates runtime type switches at compile time via planner + information-schema integration.
- concepts/quickening-runtime-bytecode-rewrite — Brunthaler's runtime-observation alternative that Vitess considered and rejected in favour of static specialization.
- concepts/callback-slice-interpreter — the Go-specific VM design that sidesteps big-switch and tail-call designs.
- concepts/vm-deoptimization — the bail-out mechanism for value-dependent type promotions; first wiki canonicalisation at the static-typing (not just JIT) layer.
- concepts/instruction-dispatch-cost — the performance dimension that determines whether JIT is worthwhile.
- concepts/tail-call-continuation-interpreter — the C/C++/Python 3.14 state-of-the-art; rejected for Go because tail calls aren't guaranteed.
- concepts/go-compiler-optimization-gap — load-bearing constraint that shapes every Vitess evalengine design choice; fast-compile-over-fast-runtime.
- concepts/jump-table-vs-binary-search-dispatch — Go switch codegen issue that makes big-switch VMs unreliable.
Key concepts introduced (continued — 2025-07-01 Postgres post)¶
- concepts/proprietary-database-operator — architectural category for "real open-source engine + proprietary control plane + proprietary proxy layer" — distinct from forking the engine ( Aurora DSQL extension-based) and from compute-storage separation (Lakebase / Neon). PlanetScale for Postgres is the canonical wiki instance.
- concepts/online-database-import — zero-downtime vendor-boundary database migration; PlanetScale for Postgres imports from any Postgres v13+ + supports automatic zero-downtime Postgres version upgrades. Canonical wiki instance of vendor-boundary migration as an advertised capability.
Key concepts introduced (continued — 2025-07-08 Caching post)¶
- concepts/cache-hit-rate — canonical definition + hit-
rate-vs-size framing + production-metric pointers
(Postgres
pg_stat_database, MySQLSHOW ENGINE INNODB STATUS, RedisINFO stats, CloudFrontCacheHitRate). - concepts/cpu-cache-hierarchy — L1/L2/L3/RAM canonical vocabulary + physics of wire-delay / SRAM-vs-DRAM density / power + cache-line framing + coherence / false- sharing pointers.
- concepts/temporal-locality-recency-bias — X.com worked example + the "slow storage + in-memory recency cache" architectural consequence + when it breaks down (scans, uniform-random, adversarial).
- concepts/spatial-locality-prefetching — photo-album worked example + cross-tier applicability table (cache line / disk block / database range scan / OS readahead / application prefetch / HDD sequential).
- concepts/fifo-cache-eviction — simplest policy; the baseline LRU improves on; second-chance variants (SIEVE / Clock) as production-reality form.
- concepts/lru-cache-eviction — industry-default; hash-map + doubly-linked-list canonical shape; midpoint- insertion refinement (InnoDB); known failure modes (scan pollution, adversarial workload, concurrency tax).
- concepts/time-aware-lru-cache-eviction — LRU + per- entry TTL with the three Dicken workload examples (48-hour social posts, daily weather, weekly email); TTL jitter pointer.
- concepts/lfru-cache-eviction — dual-queue (high-priority LRU + low-priority aggressive) with promotion/demotion; sibling of InnoDB midpoint-insertion, ARC, W-TinyLFU, 2Q in the recency+frequency family.
- concepts/postgres-shared-buffers-double-buffering —
Postgres's two-layer caching stack (
shared_buffersover OS page cache) + canonical 25%-of-RAM sizing rule + the philosophical split from MySQL InnoDB's single-layer design + the "more complex because of ACID" framing.
Key patterns introduced (continued — 2025-07-08 Caching post)¶
- patterns/pair-fast-small-cache-with-slow-large-storage — the foundational caching pattern at every tier (CPU cache / RAM / database buffer pool / application in-memory cache / CDN edge / warehouse query cache). Canonical wiki anchor for the "single idea applied at every layer" framing.
- patterns/spatial-prefetch-on-access — speculatively load neighbours on every access, exploits spatial locality. Photo-album is the canonical application-tier instance; cache-line + OS readahead + database range-scan are cross-tier relatives.
- patterns/cdn-edge-cache-over-central-origin — classic CDN architecture: edges absorb reads near users, origin remains the single source of truth + write target. CloudFront + S3 is the canonical deployment; Cloudflare and the Tigris counterexample both live on this axis.
Key systems introduced (continued — 2025-10-14 Postgres 17 vs 18 post)¶
- systems/linux-io-uring — Linux's async-I/O kernel
interface; one of Postgres 18's three
io_methodoptions. Kernel 5.1+ (May 2019); shared-memory ring buffers for submission + completion. Loses on Postgres's workload shape for structural reasons (index scans don't use AIO; checksum/memcpy remain CPU-serial). - systems/sysbench — open-source Lua-scriptable OLTP
benchmark tool (akopytov/sysbench) used by Dicken for
the 96-run sweep.
oltp_read_onlyworkload is the read-path proxy.
Key concepts introduced (continued — 2025-10-14 Postgres 17 vs 18 post)¶
- concepts/postgres-async-io — Postgres 18's new
io_methodconfiguration with three modes (sync,worker,io_uring). Scope limited to reads in 18.x; index scans don't use AIO; post-I/O work remains synchronous. - concepts/async-io-concurrency-threshold — async I/O
only pays off above a concurrency / I/O-rate threshold;
below it, sync wins on simpler code path. Measured
explicitly in Dicken's data:
io_uringloses at 1 connection, narrows at 10, wins only at 50 on local NVMe with large range scans.
Key patterns introduced (continued — 2025-10-14 Postgres 17 vs 18 post)¶
- patterns/background-worker-pool-for-async-io — dedicated
background worker processes handle I/O on behalf of calling
backends; distributes post-I/O CPU work across workers.
Postgres 18's
io_method=workerdefault. Deliberately chosen overio_uringfor CPU-distribution + interface- portability + tuneability.
Key concepts introduced (continued — 2026-04-21 Consistent Lookup Vindex post)¶
- concepts/vindex — Vitess's row-to-
keyspace_idmapping primitive. Primary Vindex drives sharding; Secondary Vindex is a cross-shard index materialisingcolumn → keyspace_idfor non-sharding-key lookups. - concepts/consistent-lookup-vindex — a Secondary Vindex variant that avoids 2PC on every DML via three-connection ordered commit, tolerating orphan lookup rows because user queries always filter on the authoritative user table.
- concepts/keyspace-id —
binary(8)shard-address identifier; the output of any Vindex lookup. - concepts/orphan-lookup-row — a lookup-Vindex row whose referenced user row no longer exists; safe (user queries still return correct results) and lazily reclaimed on next colliding unique insert.
Key patterns introduced (continued — 2026-04-21 Consistent Lookup Vindex post)¶
- patterns/ordered-commit-without-2pc — three
independent MySQL connections (
Pre/Main/Post) committed in a fixed order (rollback in same order on failure) as an alternative to 2PC for the cross-shard- write case where one participant is authoritative and user-facing queries always route through it. Weaker guarantee than 2PC but dramatically cheaper on the happy path and self-healing on the error path (via lazy orphan reclamation on duplicate-key errors).
Key concepts introduced (continued — 2022-12-14 Temporal Part 2 post)¶
- concepts/serialized-per-shard-updates — Temporal's correctness discipline: all updates to workflows on a single shard are applied sequentially. Direct citation: "Temporal serializes all updates belonging to the same shard, so all updates are sequential."
- concepts/single-shard-throughput-ceiling — direct consequence of serialisation: per-shard throughput ≈ 1 / persistence_latency. Latency-bound rather than bandwidth-bound; upsizing a shard's instance class gives diminishing returns past the latency floor — only more shards raise the ceiling.
- concepts/num-history-shards-immutability — Temporal's
numHistoryShardscannot be changed after initial cluster deployment. Operators must size for worst-case peak load on day zero; no reshard escape hatch. Closest peer is Kafka topicnum.partitions(also immutable in the shard-hash sense). Distinct from Vitess shard count, which is mutable viaReshard.
Key patterns introduced (continued — 2022-12-14 Temporal Part 2 post)¶
- patterns/split-sharded-plus-unsharded-keyspaces — canonical
production-validated VSchema shape for running Temporal on
Vitess. Two keyspaces: the write-hot tables
(
executions,history_node,history_tree,tasks,replication_tasks,timer_tasks,transfer_tasks,visibility_tasks,task_queues,shards, …) live in a sharded keyspace withxxhashPrimary Vindex onshard_id/range_hash; the small metadata tables (namespaces,cluster_metadata,queue,schema_version,buffered_events,signal_info_maps, …) live in an unsharded keyspace. Applies more generally whenever an application has sharply skewed per-table traffic + the hot tables share a natural shard-addressing column.
Queued¶
- PlanetScale Blog (tier 3) — ~295 raw articles downloaded / 283 pending ingestion. Dicken + Lambert + team's database-internals content makes up a meaningful minority; the bulk is product marketing and general how-to content that needs tier-1 scope filtering on ingest.
Related¶
- systems/planetscale
- systems/planetscale-for-postgres
- systems/planetscale-metal
- systems/neki
- systems/pgbouncer
- systems/convex
- systems/nvme-ssd
- systems/vitess
- systems/vitess-evalengine
- systems/vitess-vreplication
- systems/vitess-vdiff
- systems/vitess-movetables
- systems/mysql
- systems/postgresql
- systems/innodb
- systems/spann
- systems/spfresh
- systems/hnsw
- systems/diskann
- systems/aws-ebs
- systems/amazon-cloudfront
- systems/redis
- systems/linux-io-uring
- systems/sysbench
- systems/github
- patterns/snapshot-plus-catchup-replication
- patterns/vdiff-verify-before-cutover
- patterns/reverse-replication-for-rollback
- patterns/routing-rule-swap-cutover
- patterns/read-replica-as-migration-source
- patterns/ordered-commit-without-2pc
- patterns/singular-vs-distributed-throttler
- patterns/host-agent-metrics-api
- patterns/throttler-per-shard-hierarchy
- patterns/idle-state-throttler-hibernation
- patterns/probabilistic-rejection-prioritization
- patterns/deprioritize-all-except-target
- patterns/time-bounded-throttler-rule
- patterns/enforcement-throttler-proxy
- systems/vitess-throttler
- systems/vitess-transaction-throttler
- concepts/online-database-import
- concepts/consistent-non-locking-snapshot
- concepts/gtid-position
- concepts/binlog-replication
- concepts/query-buffering-cutover
- concepts/reverse-replication-workflow
- concepts/schema-routing-rules
- concepts/fault-tolerant-long-running-workflow
- concepts/vindex
- concepts/consistent-lookup-vindex
- concepts/keyspace-id
- concepts/orphan-lookup-row
- concepts/throttler-fail-open-vs-fail-closed
- concepts/throttler-metric-scope
- concepts/throttler-hibernation
- concepts/throttler-client-identity
- concepts/throttler-client-starvation
- concepts/throttler-exemption
- concepts/replication-heartbeat
- concepts/metric-staleness-from-polling-layers
- concepts/shadow-table
- concepts/cutover-freeze-point
- concepts/pre-staged-inverse-replication
- concepts/online-ddl
- patterns/shadow-table-online-schema-change
- patterns/instant-schema-revert-via-inverse-replication
- systems/vtbackup, systems/planetscale-singularity, systems/google-cloud-storage, systems/aws-s3, concepts/shard-parallel-backup, concepts/primary-vs-replica-as-replication-source, concepts/point-in-time-recovery, concepts/replica-creation-from-backup, concepts/backup-encryption-at-rest, concepts/soft-delete-vs-hard-delete, patterns/dedicated-backup-instance-with-catchup-replication, patterns/shard-parallel-backup-and-restore, patterns/per-shard-replica-set, companies/index
- systems/vitess-vstream
- systems/planetscale-connect
- systems/debezium
- concepts/vgtid
- concepts/unified-change-stream-across-shards
- concepts/change-data-capture
- concepts/oltp-vs-olap
- patterns/cdc-driver-ecosystem
- concepts/generated-column-mysql
- concepts/generated-hash-column
- concepts/blob-text-index-prefix-requirement
- concepts/redundant-condition-query-hint
- concepts/functional-index-mysql
- patterns/generated-hash-column-for-equality-lookup
- patterns/composite-hash-uniqueness-constraint
- patterns/redundant-hash-plus-value-predicate
- systems/bref
- systems/laravel
- systems/laravel-octane
- concepts/shared-nothing-php-request-model
- concepts/ssl-handshake-as-per-request-tax
- patterns/persistent-process-for-serverless-php-db-connections