CONCEPT

InnoDB page fill factor¶

The page fill factor is the fraction of a B+tree leaf page InnoDB fills with row data before it splits the page and allocates a new one. InnoDB makes an architectural assumption about primary-key shape that determines this ratio: ~94% on sequential inserts, as low as ~50% on random inserts. The 2× ratio compounds into 2× page count, 2× buffer-pool pressure, and 2× I/O to scan the same logical data. (Source: .)

Why the asymmetry¶

InnoDB's insert algorithm specialises on an assumption:

InnoDB assumes that the primary key will increment predictably, either numerically or lexicographically. If true, InnoDB will fill the pages to about 94% of the page size before creating a new page. When the primary key is random, the amount of space utilized from each page can be as low as 50%.

—

Sequential path (94%): Each new insert lands on the right-most leaf of the B+tree. Leaves grow on the right edge; splits happen only when the right-most leaf fills. When the split happens, InnoDB creates a new leaf and keeps the old leaf packed at ~94% of its capacity — there's no need to "balance" the pages by moving rows out, because the next inserts will all land in the new right-most leaf anyway.

Random path (~50%): A new insert lands in an unpredictable leaf. Each leaf fills at a random rate. When a random leaf fills, InnoDB 50/50 splits — the full leaf is split into two half-full leaves. Successive inserts into both halves continue at random positions. The steady-state distribution of leaf fill across random inserts is approximately uniform from 50% to ~100% — with a mean near the 50% floor.

Mechanical consequences¶

1. Page count doubles¶

If a sequentially-keyed table fits in N pages, the same data randomly keyed takes ~2N pages. "Using UUIDs that incorporate randomness can lead to excessive use of pages to store the index."

2. Buffer-pool pressure doubles¶

The InnoDB buffer pool caches pages, not rows. Doubling the page count doubles the bytes the buffer pool must hold to cache the same logical working set. Under memory pressure, buffer-pool hit rate halves and page-fault latency dominates.

3. I/O amplification on range scans¶

A SELECT ... WHERE k BETWEEN a AND b on a sequentially- keyed table reads contiguous leaves at 94% density. The same query on a randomly-keyed table reads 2× the pages at 50% density each — same number of rows but 2× the I/O.

4. Split cost concentrated vs distributed¶

Sequential: splits concentrated at the right edge. Predictable, amortised across fewer pages, cache-warm.
Random: splits distributed across the entire tree. Unpredictable, each split pulls in cold pages from disk.

Configuration¶

MySQL's innodb_fill_factor system variable (introduced in MySQL 5.7.4 for rebuild operations; applies to ALTER TABLE ... ALGORITHM=COPY / OPTIMIZE TABLE / online-DDL rebuilds, not steady-state inserts). Default 100 (= 100% fill target during rebuild). Does not override the runtime 94% vs 50% behaviour on live inserts — that's baked into InnoDB's insert path.

Mitigation¶

Use a sequential-like primary key so every insert hits the right edge of the tree. See patterns/sequential-primary-key.

BIGINT AUTO_INCREMENT — canonical choice.
UUIDv7 — time-ordered UUID with random bits only within the same millisecond; inserts are monotonic at the byte level.
UUIDv1 + UUID_TO_BIN(@u, 1) swap-flag — reorder byte layout to put timestamp bits first. See concepts/uuid-to-bin-swap-flag.
Snowflake / ULID — non-UUID alternatives with native byte-ordering by generation time.

Caveats¶

50% is a worst-case floor, not an average. Random- insert page fill is a distribution; the 50% number is the split-immediately-after-split case. Real-world averages with mature tables sit closer to 70–75% for UUIDv4 workloads.
innodb_fill_factor doesn't help steady-state. It only governs bulk-rebuild operations.
Updates cause splits too. The 94%/50% framing is insert-only. An UPDATE that grows a variable-length column beyond its original space also triggers a split.
Not MySQL-specific. Any clustered-index B+tree database (SQL Server clustered indexes, Oracle index-organised tables) exhibits the same asymmetry; the 94% constant is InnoDB-specific.
Unrelated to fillfactor in Postgres. Postgres uses a storage parameter called fillfactor on heap tables + secondary indexes — it means something different (leaves headroom for HOT updates). Postgres's heap-organised tables don't have the clustered-index PK- shape dependency.

Seen in¶

— Brian Morrison II (PlanetScale, 2024-03-19) canonicalises the 94% / 50% asymmetry as the InnoDB-specific storage cost of random primary keys. Pairs with the CHAR(36) vs BINARY(16) storage-width framing and the secondary-index PK-amplification framing to give the full storage cost of a random-UUID PK.