Skip to content

CONCEPT Cited by 1 source

Pre-fork copy-on-write

Definition

Pre-fork copy-on-write (COW) is the pattern of a long-running server process importing / initialising its application state before fork(2), so that the forked worker processes inherit the parent's memory pages read-only and the kernel only allocates new physical pages when a worker actually writes to a page.

It's a fundamental Linux kernel optimisation (any fork(2) is copy-on-write on Linux / BSD) but it only yields memory savings if the parent has something worth inheriting — hence the "pre-fork" qualifier. In practice, pre-fork COW is the dominant shape of multi-worker Python / Ruby HTTP servers (gunicorn preload=True, uWSGI, Unicorn, Puma single-mode).

Why it works mechanically

On fork(2), Linux does not duplicate the parent's resident pages. Instead:

  1. It marks the parent's pages read-only in both the parent's and the child's page tables.
  2. If either process writes to a page, the kernel traps on the write, allocates a new physical page, copies the content, and remaps the writer's virtual page to the new physical page.
  3. Read-only pages stay shared for the entire lifetime of the processes.

For a Python HTTP worker, "read-only" ends up being a surprisingly large fraction of the process image:

  • imported .pyc bytecode,
  • string / int intern tables for frequently-seen values,
  • singleton framework objects (ORM metadata, route tables),
  • anything loaded into module-level globals during import.

Writes come mostly from per-request state (local variables, GC reference-count bumps, cache writes), which doesn't scale with the static image size.

Quantifying the win — Lyft

Lyft (gunicorn + Python service, sources/2025-12-15-lyft-from-python38-to-python310-memory-leak):

Fork mode Worker PSS
No preload (each worker imports separately) ~203 MB
preload=True (leader imports once, workers COW) ~41 MB

That's a ~5× reduction per worker, with N workers per pod, giving ~(203 − 41) × N MB of free headroom per pod. On a 16-worker pod, that's ~2.6 GB reclaimed without writing a line of application code — purely by shifting when the import happens relative to fork.

The mechanism is the same one that makes Redis' BGSAVE possible, that makes fork() cheap in Unix philosophy, and that underpins container-start techniques like concepts/async-clone-hydration and concepts/block-level-async-clone one level up the stack.

The invisible cost — write amplification

Pages that are almost read-only but have a single write per worker cost a full page each. The canonical offender in CPython ≤ 3.11 was reference-count bumps: every time any code touches an interned object, its refcount field (which lives in the same 4 KiB page as the object header) is mutated, forcing a COW of that page in every worker. This silently erodes the nominal COW win over time — a long-running worker is always more resident than a freshly-forked one.

Python 3.12's "immortal objects" (PEP 683) and the ongoing per-interpreter GIL work (PEP 684, PEP 703) are partly motivated by preserving pre-fork COW savings over the lifetime of a process.

The footgun — anything done pre-fork is "done" for every worker

The flip side of "everything imported pre-fork is shared" is that everything registered pre-fork is registered by the leader — not by the workers. This includes:

  • Signal handlers registered via signal.signal(). See concepts/signal-handler-fork-inheritance — the worker may inherit the handler table or the framework may reset it; either way, the handler's closure may close over leader-local state.
  • Open file descriptors / sockets. Inherited by default unless CLOEXEC is set; can lead to surprise sharing of TCP connections, DB cursors, connection-pool sockets.
  • Threads. A fork after thread creation is famously unsafe in POSIX (the child process starts with only the forking thread alive; held locks may be in an unrecoverable state). Python's os.register_at_fork() hook exists to mitigate this class of bug.
  • Loaded C-extension state. Anything the extension cached in module globals at import time is visible to all workers — including connections, random-number-generator seeds (prior to Py3.9's per-worker reseed), and thread pools.

The cardinal rule: if it touches the outside world, register it post-fork. Gunicorn exposes a post_fork(server, worker) hook for exactly this; the equivalent for uWSGI is @postfork, for Ruby's Unicorn / Puma it is after_fork.

Seen in

Last updated · 319 distilled / 1,201 read