CONCEPT Cited by 1 source

Postgres `shared_buffers` + OS page cache (double buffering)¶

Definition¶

PostgreSQL does not manage its own full disk-cache hierarchy. Instead, it implements a two-layer caching strategy — its own in-process shared_buffers cache for recently-accessed relation pages, sitting on top of the operating system's filesystem page cache. The same disk page can exist in both layers simultaneously — hence the double buffering label. (Source: sources/2025-07-08-planetscale-caching.)

Ben Dicken's framing¶

From the 2025-07-08 Caching post:

Postgres implements a two-layer caching strategy. First, it uses shared_buffers, an internal cache for data pages that store table information. This keeps frequently read row data in memory while less-frequently accessed data stays on disk. Second, Postgres relies heavily on the operating system's filesystem page cache, which caches disk pages at the kernel level. This creates a double-buffering system where data can exist in both Postgres's shared_buffers and the OS page cache. Many deployments set shared_buffers to around 25% of available RAM and let the filesystem cache handle much of the remaining caching work. The shared_buffers value can be configured in the postgres config file.

Why two layers¶

shared_buffers holds 8 KB Postgres pages keyed by (relfilenode, blocknum). Access is Postgres-protocol-native — the buffer manager can pin pages, track dirty bits for WAL, run MVCC snapshots, and drive eviction with Postgres-specific policies (clock-sweep with usage counts).
OS page cache holds raw 4 KB filesystem pages keyed by (file, offset). Access is via read(2) / pread(2) — to reach a Postgres page, the kernel still has to serve the underlying disk blocks.

Splitting the work:

shared_buffers optimises hit-path work inside Postgres (no syscall, no deserialisation overhead).
OS page cache optimises miss-path work — when the Postgres buffer pool misses but the OS already has the page from an earlier read, the "miss" still doesn't hit the disk.

The 25% rule¶

Many deployments set shared_buffers to around 25% of available RAM.

— sources/2025-07-08-planetscale-caching

Why 25% rather than a higher fraction?

shared_buffers too small → Postgres misses more often, paying the syscall/deserialisation cost even on pages that are technically still in RAM (just in the OS layer, not the Postgres layer).
shared_buffers too large → the OS page cache gets starved, every Postgres miss really does hit the disk, and the double-buffering invariant collapses (same page in RAM twice, burning memory for nothing).
25% around-of-RAM (or 1–8 GB for practical servers) is the empirical sweet spot across most workloads: big enough that the hot working set fits, small enough that the OS page cache has room for the rest.

Outside the 25% convention:

Small memory servers (< 1 GB RAM) — 128 MB or 15% rules sometimes apply.
Dedicated high-memory DB hosts (64+ GB RAM) — many teams push to 40% or explicit GB values, but returns diminish and the tuning burden rises.
Workload-specific tuning — large sequential scans prefer OS-cache dominance (it handles readahead better); point-heavy OLTP prefers shared_buffers dominance.

Read measurement¶

Postgres hit rate against shared_buffers:

SELECT sum(blks_hit)::float / (sum(blks_hit) + sum(blks_read)) AS hit_ratio
  FROM pg_stat_database;

Note: this is the Postgres hit rate, not the end-to-end disk-miss rate. A "miss" at the Postgres layer (counted in blks_read) may still hit the OS page cache with no disk I/O. To measure the full stack you need pg_buffercache + /proc/meminfo + iostat together.

Difference vs MySQL InnoDB¶

MySQL's equivalent is the InnoDB buffer pool — a single in-process cache that stores pages, handles doublewrite / change-buffer / adaptive-hash-index mechanics, and does not rely on the OS page cache (in its common O_DIRECT configuration, it explicitly bypasses it).

Two philosophies:

Axis	Postgres	MySQL InnoDB
Relies on OS page cache	Yes — heavily	No (common config)
DB-owned cache size rule-of-thumb	~25% of RAM	50–75% of RAM
Double-buffered pages in RAM	Yes — acceptable	No — considered waste
Hit rate measurement	`pg_stat_database`	`SHOW ENGINE INNODB STATUS`
Preferred IO mode	Buffered + OS readahead	`O_DIRECT`

Both approaches work. The Postgres approach leans on the OS being a good generalist; the MySQL approach leans on the DB knowing its workload better than the OS.

Why ACID-aware caching is harder¶

Dicken's closing observation (Source: sources/2025-07-08-planetscale-caching):

Arguably, these are more complex than a "regular" cache as they also have to be able to operate with full ACID semantics and database transactions. Both databases have to take careful measures to ensure these pages contain accurate information and metadata as the data evolves.

Key complexities:

Dirty page eviction must write back to WAL before overwriting, or crash-recovery loses durability.
MVCC snapshot visibility — multiple transactions may read conflicting versions of the same page.
Locking protocols (content locks, index locks) must be respected by the buffer manager.
Checkpoint synchronisation — the buffer pool is the handoff between in-memory dirty state and durable-on-disk state.

These are all orthogonal to the LRU-ness of the eviction itself but they constrain which pages can be evicted at which moment.

Seen in¶

sources/2025-07-08-planetscale-caching — Ben Dicken canonicalises the two-layer shape + the 25%-of-RAM sizing rule + the "more complex than a regular cache because of ACID" framing.

concepts/innodb-buffer-pool — MySQL's counterpart (single-layer design, bypasses OS cache).
concepts/linux-page-cache — the OS-level layer underneath.
concepts/cache-hit-rate
concepts/working-set-memory — what you're actually sizing for.
concepts/storage-latency-hierarchy — the latency-gap double-buffering amortises.
systems/postgresql
systems/mysql

Seen in¶

sources/2025-07-08-planetscale-caching — canonical wiki disclosure of the two-layer caching strategy (Ben Dicken, 2025-07-08) with the 25%-of-RAM shared_buffers rule of thumb.
— Simeon Griggs (PlanetScale, 2026-03-30) adds the operational-discipline corollary: "The default value should be sufficient for most workloads, and modifying it should not be your first step in troubleshooting memory usage." Also canonicalises the 8 KB default page size as the unit of shared_buffers accounting ("the fixed-size (8 KB by default) chunks of table and index data it works with"). Reinforces the accounting subtlety that shared_buffers is part of RSS (non-reclaimable) while the OS page cache is in the reclaimable category — increasing shared_buffers shifts memory between accounting categories without changing total use.

Postgres shared_buffers + OS page cache (double buffering)¶

Definition¶

Ben Dicken's framing¶

Why two layers¶

The 25% rule¶

Read measurement¶

Difference vs MySQL InnoDB¶

Why ACID-aware caching is harder¶

Seen in¶

Related¶

Seen in¶

Postgres `shared_buffers` + OS page cache (double buffering)¶