MongoDB — The Cost of Not Knowing MongoDB, Part 3: appV6R0 to appV6R4¶
Summary¶
Third and final installment of MongoDB's senior-developer-authored case study on iteratively tuning a document schema by load-testing against a fixed hardware budget (an event-counter application inserting 500M events + serving five date-range report aggregations). Parts 1 and 2 (not yet ingested on this wiki) established a baseline and applied two named MongoDB schema patterns — the Bucket Pattern (many events → one time-windowed bucket document) and the Computed Pattern (pre-aggregate status totals into fields). The Part-2 winner appV5R3 bucketed by quarter with per-day sub-documents, giving 385 B documents / 11.96 GB data / 1.11 GB index across 33.4 M documents.
Part 3's load-test observation from appV5R4 was that disk throughput
on the MongoDB server was the remaining bottleneck, so the goal was
to shrink the document further. The unconventional lever is a
dynamic schema where
field names encode data: the items field is promoted from an array
of {date, a, n, p, r} objects to a sub-document whose field names
are the day (or month+day) part of the event date and whose values
are the status totals. The year/month (or year/quarter) already lives
in the document _id, so repeating it in each item's date field is
wasted bytes.
Five revisions are proposed (title promises appV6R0 through appV6R4;
the raw we captured covers appV6R0 + appV6R1 fully and has a partial
table for appV6R2 before truncation — see Caveats). The headline
finding: the bottleneck shape changed between revisions. appV6R0
(month bucket) shrank documents dramatically (125 B vs appV5R3's 385 B,
a 67.5 % reduction) but didn't deliver a proportional performance
win because the index size (3.13 GB) exceeded WiredTiger's 1.5 GB
cache allocation on the test machine's 4 GB RAM — the new bottleneck
was working-set-in-cache, not disk throughput. appV6R1 (quarter
bucket, same dynamic-schema trick with MMDD field names) walked the
index size back down to 1.22 GB while preserving document-size wins
(264 B document, 31.4 % smaller than appV5R3), yielding a 28 % total-
size-per-event improvement.
Headline method — not finding — for the wiki: observe the bottleneck empirically, apply the smallest schema change that moves it, re-measure, repeat. The appV6R0 → appV6R1 pivot from shrink-the-document to shrink-the-index is a clean instance of patterns/schema-iteration-via-load-testing where the load test, not the developer, chose the next move.
Key takeaways¶
- Dynamic schemas encode information in field names, not values.
The core Part-3 trick: instead of an
itemsarray whose elements carry adatefield, makeitemsa sub-document whose field names are the date-discriminator (e.g."05"for day 5 in appV6R0's monthly bucket,"0605"for June 5 in appV6R1's quarter bucket). Year + month (or year + quarter) already live in the document_id, so repeating it inside each item is wasted bytes. Named "isn't very common to see" by the author — it only works when the encoding key has bounded cardinality and the application understands the convention. Trade-off: harder to query with conventional matchers (range filtering insideitemsrequires$objectToArray+$reducein an aggregation pipeline, not a simple$matchonitems.date). - appV6R0 (monthly bucket, DD-keyed items): 67.5 % smaller document, no proportional speedup. 500 M events → 95,350,319 documents, 11.1 GB data, 125 B average document size, 3.33 GB storage, 3.13 GB index. Data size / event fell to 23.8 B (from appV5R3's 25.7 B, a 14.1 % reduction per event). But load-test Get Reports + Bulk Upsert rates were only slightly better than appV5R0 — "the performance improvement was not as substantial as expected."
- The new bottleneck is index-size vs WiredTiger cache. appV6R0's 3.13 GB index lives on a 4 GB-RAM machine where WiredTiger allocates ~1.5 GB for its cache by default; the index cannot stay resident, so every query incurs disk reads for index pages. "The limiting factor in this case is memory/cache rather than document size, which explains the lack of a significant performance improvement." The schema change moved the bottleneck from disk throughput on data pages to disk reads on index pages — an improvement in document size is not automatically an improvement in the relevant cache residency.
- appV6R1 pivots to quarterly bucketing, MMDD-keyed items, to shrink the index. Going from month-granularity buckets to quarter-granularity cuts document count roughly 3× (monthly 95.3 M → quarterly 33.4 M) and proportionally the index. Result: 33,429,366 documents, 8.19 GB data, 264 B average document, 2.34 GB storage, 1.22 GB index — index now fits in the 1.5 GB WiredTiger cache. Per-event metrics: 17.6 B data / 2.6 B index / 20.2 B total vs appV5R3's 28.1 B total — a 28.1 % per-event total-size reduction.
- Document size didn't scale linearly with bucketing range. Author's initial assumption — tripling the bucketing window (month → quarter) triples the document — was wrong: appV6R1 docs were only ~2× appV6R0 docs despite holding 3× the data. BSON document overhead (field name headers, length prefixes, small fixed offsets) amortizes better over denser sub-documents, and WiredTiger storage compression (snappy default) compounds on the repeated small-integer status values. Lesson: predict schema-size effects from measurement, not arithmetic.
$objectToArray+$reduceis the dynamic-schema cost at read time. Get Reports on the appV6RX family does not have a conventional$matchon an indexed date field — the date is split between_id(year ± month ± quarter) and the items sub-document's field names. Aggregation pipeline pattern:{$match: docsFromKeyBetweenDate}(range-filter by_id) →{$addFields: buildTotalsField}(convert items to array with$objectToArray,$reducewith per-day date construction and in-range accumulation) →{$group: groupSumTotals}→{$project}. Paying compute cycles per matched document to amortize fewer / smaller documents over the collection — a compute-for-storage trade-off.- Single
_idindex across all revisions. appV4 onwards never adds a secondary index; the compoundkey + year + (month | quarter)inside_idcarries all query predicates. This keeps index size tied strictly to document count, making the month → quarter bucket-width change a 1:3 lever on index pressure — the simplest possible knob to turn. - Methodology = load test first, then decide. The article's implicit lesson isn't any one schema: it's that each appV6RX revision is proposed based on the previous revision's load-test observation of which resource (disk, index cache, memory) was the first to saturate. Schema iteration driven by the load test's waterfall chart of bottlenecks, not by an architect guessing. Canonical instance of patterns/schema-iteration-via-load-testing.
Architecture (as described)¶
Hardware envelope (load test):
- MongoDB server on a machine with 4 GB RAM, ~1.5 GB of that
WiredTiger-allocated for its cache (default 50 % of RAM minus 1 GB).
- Single _id index only across the entire appV4+ family (no secondary).
- Collection sized at 500 million events inserted per revision.
Document shape evolution (Part 3):
appV5R3 (Part 2 winner — quarter bucket, per-day computed totals):
_id: <key+year+quarter>
items: [
{ date: 2022-06-05, a: 10, n: 3 },
{ date: 2022-06-16, p: 1, r: 1 },
...
]
→ 385 B / doc, 33.4M docs, 11.96 GB data, 1.11 GB index
appV6R0 (dynamic monthly bucket, DD-keyed):
_id: <key+year+month>
items: {
"05": { a: 10, n: 3 },
"16": { p: 1, r: 1 },
"27": { a: 5, r: 1 },
"29": { p: 1 }
}
→ 125 B / doc, 95.3M docs, 11.1 GB data, 3.13 GB index
Bottleneck: index > WiredTiger cache.
appV6R1 (dynamic quarter bucket, MMDD-keyed):
_id: <key+year+quarter>
items: {
"0605": { a: 10, n: 3 },
"0616": { p: 1, r: 1 },
"0627": { a: 5, r: 1 },
"0629": { p: 1 }
}
→ 264 B / doc, 33.4M docs, 8.19 GB data, 1.22 GB index
Index fits in cache; data-size/event = 17.6 B (best so far).
Upsert surface (appV6R1 example):
updateOne({
filter: { _id: buildId(event.key, event.date) }, // key + year + quarter
update: { $inc: {
[`items.${MMDD}.a`]: event.approved,
[`items.${MMDD}.n`]: event.noFunds,
[`items.${MMDD}.p`]: event.pending,
[`items.${MMDD}.r`]: event.rejected,
} },
upsert: true,
})
Relies on $inc's treatment of missing fields as zero (creates them on
first write) and of upsert creating the bucket document on first hit.
Both properties are what make the dynamic-schema trick safe under
concurrent writers — no separate "initialize the bucket" path.
Read surface — aggregation pipeline for one of five report ranges:
[
{ $match: <range-filter on _id by key+year+(month|quarter)> },
{ $addFields:<convert items to [[DD|MMDD, status], ...] array,
$reduce over it constructing Date(YYYY,MM,DD) per entry,
accumulate totals only for in-range entries> },
{ $group: <$sum the per-document totals> },
{ $project: { _id: 0 } }
]
Operational numbers (disclosed in raw)¶
Collection-level stats per revision (500 M events inserted each):
| Revision | Docs | Data | Doc size | Storage | Index |
|---|---|---|---|---|---|
| appV5R0 | 95,350,431 | 19.19 GB | 217 B | 5.06 GB | 2.95 GB |
| appV5R3 | 33,429,492 | 11.96 GB | 385 B | 3.24 GB | 1.11 GB |
| appV6R0 | 95,350,319 | 11.1 GB | 125 B | 3.33 GB | 3.13 GB |
| appV6R1 | 33,429,366 | 8.19 GB | 264 B | 2.34 GB | 1.22 GB |
| appV6R2 | 33,429,207 | 9.11 GB | 293 B | 2.80 GB | 1.26 GB |
Per-event stats:
| Revision | Data/event | Index/event | Total/event |
|---|---|---|---|
| appV5R0 | 41.2 B | 6.3 B | 47.5 B |
| appV5R3 | 25.7 B | 2.4 B | 28.1 B |
| appV6R0 | 23.8 B | 6.7 B | 30.5 B |
| appV6R1 | 17.6 B | 2.6 B | 20.2 B |
| appV6R2 | 19.6 B | 2.7 B | 22.3 B |
Hardware: 4 GB RAM, 1.5 GB WiredTiger cache allocation (canonical default: "WiredTiger uses the larger of either 50 % of (RAM - 1 GB) or 256 MB"; 4 GB RAM → 1.5 GB cache).
Numbers not disclosed: concrete Bulk Upsert throughput, Get Reports latency percentiles, or absolute rates in the load-test graphs (the raw references Figures 1–8 but those are image files we don't have); machine storage type (SSD vs NVMe), filesystem, MongoDB version, compression algorithm actually used (snappy default presumed but not explicitly named in the Part-3 raw); CPU count; whether this is a standalone or replica-set deployment (aggregation pipeline code targets a single instance); connection-pool sizing.
Caveats¶
- The raw is truncated. The article title promises appV6R0 through appV6R4 — five revisions — but the captured raw ends mid-table during appV6R2's statistics section with no appV6R2 prose (only its row in the stats tables), and appV6R3 / appV6R4 are absent entirely. Additionally, appV6R1's entire section is duplicated twice back- to-back in the raw, strongly suggesting a scraping or rendering glitch. Takeaways 1–7 above rest on appV6R0 + appV6R1 (fully present); the appV6R2 shape is inferable only from its statistics row (264 B → 293 B / doc with 9.11 GB data is larger per document than appV6R1 and more total data despite similar doc count — suggests appV6R2 adds back fields or changes compression the other way). Do not cite appV6R2 / R3 / R4 structural claims from this source; re-fetch if needed.
- Part 1 and Part 2 of the series are not yet ingested. The article assumes familiarity with appV3 / appV4 / appV5RX (field concatenation / data-type tightening / field-name shortening / bucket pattern / computed pattern). The concepts/bucket-pattern and concepts/computed-pattern wiki pages created alongside this source distill what we can infer from Part 3's recap; they should be expanded when Parts 1 + 2 are ingested.
- Single-machine, single-node load test. The 4 GB-RAM / 1.5 GB-cache envelope is a deliberately constrained test rig to make storage / cache trade-offs visible at reasonable dataset sizes; it is not a replica-set benchmark and not a production topology. The "index > cache" finding is hardware-envelope-dependent. On a 64 GB-RAM machine, appV6R0's 3.13 GB index fits in cache trivially and the appV6R0 → appV6R1 pivot would likely be unnecessary — the schema winner would be appV6R0 (smaller documents, better disk efficiency). The general lesson — the bottleneck shape changes as you change the schema — is envelope-independent; the specific "quarter wins over month" conclusion is not.
- No WiredTiger compression-algorithm comparison in Part 3. The intro paragraph mentions "modifying the storage compression algorithm" as a planned Part-3 lever alongside the dynamic schema; the raw we have only shows the dynamic-schema side. One or more of the missing appV6R2 / R3 / R4 revisions presumably swap snappy → zstd (higher compression ratio, more CPU) or disable compression; without the raw, we can't cite the finding.
- Single-tenant workload shape. The event-counter pattern
(high-cardinality
key× time-bucketed counters) fits the bucket-per-key-per-window shape extremely well. Workloads with unbounded sub-document cardinality (thousands of distinct days within a bucket) or without a natural time dimension will not see the same dynamic-schema wins; the technique is more specialized than MongoDB's two named patterns it builds on. - Benchmark methodology isn't fully disclosed. The raw references "Get Reports rates", "Bulk Upsert rates", "Desired rates" plots without publishing the target-rate numeric values, client concurrency, think-time distribution, or how "Desired" was derived. Qualitative claims ("slight advantage", "clear edge") only.
Source¶
- Original: https://www.mongodb.com/company/blog/technical/cost-of-not-knowing-mongodb-part-3-appv6r0-appv6r4
- Raw markdown:
raw/mongodb/2025-10-09-the-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4-505e4781.md
Related¶
- systems/mongodb-server — target database.
- systems/wiredtiger — the storage-engine component whose cache sizing drives the appV6R0 → appV6R1 pivot.
- concepts/bucket-pattern — MongoDB's named pattern this article's dynamic schema is an advanced variation of.
- concepts/computed-pattern — MongoDB's pattern for pre-aggregating status totals (a / n / p / r) at write time; load-bearing on Part 2's appV5R3 baseline that Part 3 improves on.
- concepts/dynamic-schema-field-encoding — the Part-3 novel concept: field names are data.
- concepts/wiredtiger-cache — the specific resource the appV6R0 index overflowed.
- concepts/document-storage-compression — WiredTiger snappy / zstd compression; referenced in intro as a Part-3 lever.
- concepts/disk-throughput-bottleneck — the Part-2-inherited bottleneck the dynamic schema attacks.
- concepts/working-set-memory — the cache-fit framing that replaced disk throughput as the appV6R0 bottleneck.
- concepts/bson-document-overhead — why document size didn't scale linearly with bucketing range.
- concepts/aggregation-pipeline —
$objectToArray+$reduceis Part-3's read-side cost of the dynamic schema. - patterns/dynamic-schema-field-name-encoding — the Part-3 pattern.
- patterns/schema-iteration-via-load-testing — the methodology.
- companies/mongodb