CONCEPT Cited by 1 source
BSON document overhead¶
Definition¶
BSON document overhead is the per-document + per-field bytes that
MongoDB's on-the-wire / on-disk BSON binary
encoding pays regardless of user data: a document length prefix, a
terminating null byte, and per-field (type marker, field name,
null terminator, value length or value) headers.
These bytes are invisible in a JSON.stringify() view of the same
document but are real on disk and real in the
WiredTiger cache. They shape the
economics of every schema-design trade-off.
Per-field overhead breakdown¶
For a scalar field "status": 1 in a BSON document:
- 1 byte: type tag (
0x10= int32) - N bytes: field name
"status"= 6 bytes - 1 byte: field-name null terminator
- 4 bytes: int32 value
→ 12 bytes total for a 4-byte value. The field name alone costs more than the value it carries; the type tag + null bytes add another 2.
Field names are stored once per occurrence, not interned.
Storing {status: 1, status: 2, ...} 1,000 times in an array pays
the "status" name tax 1,000 times.
Per-document overhead¶
Each BSON document pays:
- 4 bytes document length header
- 1 byte trailing null terminator
- Plus, as a stored document, the 12-byte minimum
_idObjectId field if not overridden.
→ ~17 bytes minimum per-document regardless of content.
For a million-document collection with tiny payloads, per-document overhead dominates; for a collection of few large documents, it's negligible.
Why it shapes schema design¶
Three direct consequences:
- Favor fewer larger documents over many small ones. The concepts/bucket-pattern leans on this: collapse 100 per-event documents into one 100-event bucket and save 100 × 17 = 1,700 bytes of per-document overhead alone.
- Short field names at scale.
"approved": 10costs1 + 8 + 1 + 4 = 14bytes;"a": 10costs1 + 1 + 1 + 4 = 7bytes — half. At 500 M events this compounds into gigabytes. MongoDB Cost of Not Knowing Part 1 (not yet ingested) walks through this as its first optimization. - Dynamic schemas
move a value into a field-name position. Instead of
{date: "0605", ...}paying"date"every time, the field name is the date. The per-element"date"tax vanishes. Measured result in the case study: appV5R3 → appV6R0 document size dropped from 385 B to 125 B (67.5 %).
Compression interaction¶
WiredTiger's default snappy compression operates on whole pages (~32 KB blocks by default); repeated field names compress extremely well. So:
- Uncompressed BSON overhead — what in-cache documents cost (WiredTiger cache holds uncompressed pages).
- Compressed on-disk storage — typically 3–4× smaller than
the uncompressed form for repetitive schemas. MongoDB's
collStatsreports both assize(uncompressed data) andstorageSize(compressed).
The cache budget is about uncompressed overhead; the disk I/O + storage-cost budgets are about compressed bytes. A schema that wins on one axis can still lose on the other. See concepts/document-storage-compression.
Why document size didn't scale linearly with bucketing range¶
In MongoDB's case study, widening the bucketing window from month (appV6R0) to quarter (appV6R1) stored 3× the data per bucket but documents grew only ~2× in size (125 B → 264 B). The missing factor is BSON overhead amortization:
- Per-document overhead (length prefix,
_id, trailing null) is paid once per bucket regardless of how many inner elements it carries. - Per-field overhead inside
itemsis paid once per encoded day — but the outeritemsfield name itself is paid once per document. - Denser documents pay these fixed costs less often.
The author's initial arithmetic-based prediction (3× data → 3× document) was therefore pessimistic; measurement showed a better ratio because overhead stayed flat as density grew.
Seen in¶
- sources/2025-10-09-mongodb-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4 — document-size scaling from appV6R0 (125 B) to appV6R1 (264 B) at 3× the data is the canonical wiki illustration of BSON per- document overhead amortizing. The Part 1 / Part 2 articles in the same series also exercise this lever (field-name shortening, bucket pattern) but are not yet ingested on this wiki.