Skip to content

PLANETSCALE 2025-09-24

Read original ↗

PlanetScale — Processes and Threads

Summary

Ben Dicken (PlanetScale, 2025-09-24, re-fetched 2026-04-21) publishes an interactive-article pedagogical piece on operating- system process and thread abstractions that lands, in the final third of the body, on the architectural trade-off between Postgres's process-per-connection and MySQL's thread-per-connection models, and canonicalises connection pooling as the universal mitigation for both. The opening two-thirds is OS fundamentals (CPU + RAM, made-up instruction set, multitasking via processes + [[concepts/ context-switch|context switching]], threads + pthread_create, fork / execve / clone system calls); the final third is the database architectural pay- off — named MySQL mysqld-single-process + thread-per-connection model explicitly positioned as the architectural response to Postgres's process-per-connection cost. The post is tutorial-first, but its production-architecture framing of Postgres vs MySQL concurrency models + connection pooling is canonical-substrate content the wiki previously referenced without a definitional home.

Key takeaways

  1. A process is an instance of a program being executed (concepts/process-os); the OS's fundamental abstraction for isolating executing code and sharing CPU + RAM. Each process holds its own address space (code + data + stack + heap) and the OS multiplexes the CPU across many processes via time slicing — typically milliseconds-scale per slice.

  2. Context switch cost is ~5 μs on modern CPUs ([[concepts/ context-switch]]) — Dicken canonicalises the per-switch cost: "The full time of a context switch takes ~5 microseconds on modern CPUs (1 microsecond = 1 millionth of a second). Though this sounds fast (and it is!) it requires executing tens of thousands of instructions, and this happens hundreds of times per second." At billions of instructions/sec, managing switching consumes tens of millions of instructions/sec — the "small performance penalty" of multi-processing. Canonical wiki datum. (Source: [[sources/2026-04-21-planetscale-processes- and-threads]])

  3. Thread switching is ~5× faster than process switching (~1 μs). Threads share the process's memory + code (except their own stacks), so switching between threads doesn't require virtual-memory-management or full register-state machinery. This is the structural reason threaded application models can sustain more concurrent work than process-per-work models on the same hardware.

  4. fork() + execve() are the two canonical process-creation system calls (concepts/fork-execve). fork() clones the calling process into a child; execve() replaces the current program image with a new one loaded from disk. The typical spawn-a-program pattern: fork() then, in the child, execve(path_to_binary, ...). pthread_create() is the POSIX call for threads; both fork and pthread_create are thin wrappers around the underlying clone() system call, which takes flags (CLONE_VM, CLONE_FILES, …) controlling what the child/new thread shares with the parent.

  5. Postgres is process-per-connection; MySQL is thread-per-connection — the canonical database-architecture pay-off of the process-vs-thread distinction. Dicken verbatim: "Postgres is implemented with a process-per-connection architecture. Each time a client makes a connection, a new Postgres process is created on the server's operating system. There is a single 'main' process (PostMaster) that manages Postgres operations, and all new connections create a new Process that coordinates with PostMaster." And: "MySQL is a great contrast, designed to run as a single process (mysqld). However, it is also capable of handling thousands of queries per-second, hundreds of connections, and utilizing multi-core CPUs. It achieves this via threads." Canonical new patterns patterns/process-per-connection-database + [[patterns/ thread-per-connection-database]].

  6. The process-per-connection model has a structural memory + time overhead that the thread-per-connection model avoids — Dicken frames it verbatim as the standard criticism: "Processes are heavy: there is memory overhead and a time overhead for managing them." This is the same memory-overhead property canonicalised on the wiki by Liz van Dijk's 2022-11-01 One million connections post ([[sources/2026-04-21-planetscale-one- million-connections]]) as MySQL's max_connections ceiling (concepts/max-connections-ceiling) — Dicken's post extends the framing backward one abstraction layer to the OS-level process-vs-thread memory cost.

  7. Connection pooling is the universal mitigation for both models (concepts/connection-pool-exhaustion). Dicken verbatim: "both MySQL and Postgres suffer from performance issues when the connection counts get too high. Even with threads, each connection requires dedicated memory resources to manage connection state. MySQL, Postgres, and many other databases use a technique known as connection pooling to help." Canonical framing: "Connection poolers sit between clients and the database. All connections from the client are made to the pooler, which is designed to be able to handle thousands at a time. It maintains its own pool of direct connections to the database, typically between 5 and 50. This is a small enough number that the database server is not negatively impacted by too many connections. The pooler then intelligently distributes incoming queries/transactions across the fixed set of connections. It acts as a funnel: pushing the queries from thousands of connections into tens of connections." Canonical connection-pool-size datum: typical pooler-to-DB pool 5–50 connections vs arbitrary client fan-in. This is the OS-fundamentals altitude of the same architectural lever canonicalised by the 2022-11-01 van Dijk benchmark (patterns/two-tier-connection-pooling): the process-vs- thread memory cost per connection is the why; the two-tier pooler is the how.

  8. Virtual memory is the unacknowledged-in-this-post substrate that makes process context switches affordable. Dicken notes it as an advanced topic ("OSs use virtual memory. This is a subject for another day") — the per-process page table + TLB means the "copy all of RAM" picture in the simplified visual isn't what actually happens; only the register state + TLB flush

  9. page-table-root swap is required at context-switch time.

Operational numbers

  • Context switch cost: ~5 μs (modern CPUs) = tens of thousands of instructions per switch.
  • Thread switch cost: ~1 μs (5× faster than process switch).
  • Typical connection pool size: 5–50 direct connections to the DB fronting thousands of upstream client connections (Dicken verbatim).
  • Context switches per second: hundreds per second (Dicken's framing); at billions of instructions/sec, bookkeeping consumes tens of millions of instructions/sec — the "small performance penalty" of multi-processing.

Canonical framings

  • "Processes are heavy" — the structural criticism of process-per-connection architecture that motivates MySQL's contrasting single-process design.
  • "A pooler acts as a funnel: pushing the queries from thousands of connections into tens of connections" — the canonical metaphor for connection pooling's role.
  • "With lots of data, this would be a big mess" is not in this post (that was the JOIN tutorial); the canonical framing here is the pooler as the architectural answer to the connection- memory-cost problem inherent to both process-per-connection (Postgres) and thread-per-connection (MySQL) models.

Caveats

  • Interactive-article constraints: the post is built around interactive CPU simulations in the browser (play button, instruction-set visualiser, process-swap buttons). The scraped markdown captures text but not the interactive widgets; readers of the raw markdown miss the animated context-switch visualisations that carry much of the pedagogical load.
  • Single-core assumption: Dicken explicitly assumes a one-core CPU for the whole article ("we're going to assume you only have one core in your CPU for the rest of this article"). The multi-core regime is where thread-per-connection's advantage over process-per-connection compounds — threads in a single address space can share work across cores without cross-process IPC — but this is not walked through.
  • Virtual memory deferred: the simplified visuals show "copy the RAM" at context-switch time which is misleading; real OSs use per-process page tables + TLB so the switch cost is much less than it looks. Dicken acknowledges this ("a subject for another day") but the simplified model stays in readers' heads.
  • Postgres connection cost under-quantified: Dicken names "memory overhead and a time overhead" for Postgres processes without numbers. Typical production Postgres backend ~5–10 MB plus shared buffers; this is the load-bearing datum driving pgbouncer adoption, but it's not in the post.
  • Connection pooler ecosystem elided: pgbouncer + ProxySQL + PlanetScale's VTTablet two-tier model are not named. The post stays at the generic "connection poolers sit between clients and the database" altitude.
  • Thread-per-connection in MySQL is not strictly literal: MySQL uses a thread pool with thread-caching (thread_cache_size, thread-pool plugin in Enterprise / MariaDB). The post's "it achieves this via threads" is directionally correct but simplifies the real concurrency model.
  • Pedagogical voice: no production numbers, no customer retrospective, no scaling war-story; the post is in the same genre as Dicken's 2024-09-09 B-trees and database indexes and 2025-03-13 IO devices and latency — OS/database fundamentals with an interactive layer.
  • Date: original publication 2025-09-24, re-fetched 2026-04-21; ~7 months old at ingest time.

Source

Last updated · 470 distilled / 1,213 read