CONCEPT Cited by 1 source
Pre-fork copy-on-write¶
Definition¶
Pre-fork copy-on-write (COW) is the pattern of a long-running
server process importing / initialising its application state
before fork(2), so that the forked worker processes inherit
the parent's memory pages read-only and the kernel only allocates
new physical pages when a worker actually writes to a page.
It's a fundamental Linux kernel optimisation (any fork(2) is
copy-on-write on Linux / BSD) but it only yields memory savings if
the parent has something worth inheriting — hence the "pre-fork"
qualifier. In practice, pre-fork COW is the dominant shape of
multi-worker Python / Ruby HTTP servers (gunicorn preload=True,
uWSGI, Unicorn, Puma single-mode).
Why it works mechanically¶
On fork(2), Linux does not duplicate the parent's resident pages.
Instead:
- It marks the parent's pages read-only in both the parent's and the child's page tables.
- If either process writes to a page, the kernel traps on the write, allocates a new physical page, copies the content, and remaps the writer's virtual page to the new physical page.
- Read-only pages stay shared for the entire lifetime of the processes.
For a Python HTTP worker, "read-only" ends up being a surprisingly large fraction of the process image:
- imported
.pycbytecode, - string / int intern tables for frequently-seen values,
- singleton framework objects (ORM metadata, route tables),
- anything loaded into module-level globals during import.
Writes come mostly from per-request state (local variables, GC reference-count bumps, cache writes), which doesn't scale with the static image size.
Quantifying the win — Lyft¶
Lyft (gunicorn + Python service, sources/2025-12-15-lyft-from-python38-to-python310-memory-leak):
| Fork mode | Worker PSS |
|---|---|
| No preload (each worker imports separately) | ~203 MB |
preload=True (leader imports once, workers COW) |
~41 MB |
That's a ~5× reduction per worker, with N workers per pod, giving
~(203 − 41) × N MB of free headroom per pod. On a 16-worker pod,
that's ~2.6 GB reclaimed without writing a line of application
code — purely by shifting when the import happens relative to fork.
The mechanism is the same one that makes Redis' BGSAVE possible,
that makes fork() cheap in Unix philosophy, and that underpins
container-start techniques like concepts/async-clone-hydration
and concepts/block-level-async-clone one level up the stack.
The invisible cost — write amplification¶
Pages that are almost read-only but have a single write per worker cost a full page each. The canonical offender in CPython ≤ 3.11 was reference-count bumps: every time any code touches an interned object, its refcount field (which lives in the same 4 KiB page as the object header) is mutated, forcing a COW of that page in every worker. This silently erodes the nominal COW win over time — a long-running worker is always more resident than a freshly-forked one.
Python 3.12's "immortal objects" (PEP 683) and the ongoing per-interpreter GIL work (PEP 684, PEP 703) are partly motivated by preserving pre-fork COW savings over the lifetime of a process.
The footgun — anything done pre-fork is "done" for every worker¶
The flip side of "everything imported pre-fork is shared" is that everything registered pre-fork is registered by the leader — not by the workers. This includes:
- Signal handlers registered via
signal.signal(). See concepts/signal-handler-fork-inheritance — the worker may inherit the handler table or the framework may reset it; either way, the handler's closure may close over leader-local state. - Open file descriptors / sockets. Inherited by default unless
CLOEXECis set; can lead to surprise sharing of TCP connections, DB cursors, connection-pool sockets. - Threads. A fork after thread creation is famously unsafe in
POSIX (the child process starts with only the forking thread
alive; held locks may be in an unrecoverable state). Python's
os.register_at_fork()hook exists to mitigate this class of bug. - Loaded C-extension state. Anything the extension cached in module globals at import time is visible to all workers — including connections, random-number-generator seeds (prior to Py3.9's per-worker reseed), and thread pools.
The cardinal rule: if it touches the outside world, register it
post-fork. Gunicorn exposes a post_fork(server, worker) hook for
exactly this; the equivalent for uWSGI is @postfork, for Ruby's
Unicorn / Puma it is after_fork.
Related¶
- systems/gunicorn — the Python pre-fork server where this concept matters most in practice.
- concepts/signal-handler-fork-inheritance — the most common way pre-fork COW trips teams up.
- patterns/signal-triggered-heap-snapshot-diff — profiling pattern whose correct wiring depends on understanding pre-fork COW.
- concepts/copy-on-write-merge — different concept with the same name in the data-lake compaction world; not the OS-level fork mechanism.
Seen in¶
- sources/2025-12-15-lyft-from-python38-to-python310-memory-leak
— Lyft quantifies the Python / gunicorn case:
~203 MB → ~41 MB worker PSS when
preload=Trueis flipped on; the same article then demonstrates the signal-handler footgun that inevitably follows.