CONCEPT Cited by 1 source
Bucket Pattern¶
Definition¶
Bucket Pattern — MongoDB's named schema-design pattern where many fine-grained events (one event = one would-be document) are grouped into a single bucket document by a shared key + a time window (day / month / quarter / year). Instead of 500 M event documents, the collection holds ~one bucket document per key per window, each bucket carrying an internal array (or, in the dynamic-schema variation, sub-document) of the events that fell inside its window.
The trade-off rebalances storage, indexing, and write amplification:
- Fewer documents ⇒ smaller index. A per-event collection with
500 M docs needs 500 M
_identries; a bucketed collection with 33 M quarter-buckets needs ~15× fewer. systems/mongodb-server indexes every document in the_idB-tree; shrinking the index is a direct lever on WiredTiger cache pressure. - Denser documents ⇒ better BSON overhead amortization.
Per-document overhead (field
name headers, length prefixes,
_id) is ~dozens of bytes; a 100-event bucket amortizes that over 100 events where a per-event collection pays it 100 times. - Each write becomes an upsert +
$inc/$push. Write amplification trade-off: anupdateOnewithupsert: trueagainst a time-bucket_idis one network round-trip but the server may rewrite the whole document if it grows past its in-place-update budget (MongoDB uses power-of-two allocation + moveable documents; WiredTiger does copy-on-write on any update). - Queries filter the bucket then project the slice. Reads need
both
$matchon the bucket_idrange and an in-document filter on the inner array/sub-document. Aggregation pipeline $filter/$reduce/$objectToArrayis the standard shape.
When to use¶
- Time-bucketed counters / metrics / events. Per-key status counts binned by day / month / quarter — the MongoDB "Cost of Not Knowing" event-counter running example fits exactly.
- IoT sensor streams. A single device writes samples at regular intervals; one document per device per hour is a common starting point.
- Log aggregation where queries are "last hour per source" style, not "this one specific event."
- Pre-aggregation surfaces. Combined with the concepts/computed-pattern — bucket by window, pre-aggregate per-sub-bucket status totals at write time.
When not to use¶
- Point-lookup workloads on individual events. If the primary query is "fetch event 12345," bucketing adds an extraction step with no corresponding read savings.
- Unbounded bucket cardinality. MongoDB's per-document limit is 16 MB; buckets that grow without bound hit the ceiling and require either finer time-bucketing or a dynamic re-bucketing step.
- Workloads without a natural bucketing dimension. Events without a time axis or without a high-cardinality grouping key don't benefit; the bucket becomes a meaningless wrapper.
Relationship to other MongoDB schema patterns¶
- concepts/computed-pattern — often applied on top of
Bucket: each bucket stores pre-aggregated status counters (e.g.
{a: 10, n: 3, p: 0, r: 1}) rather than raw events. - patterns/dynamic-schema-field-name-encoding — further shrinks the bucket's inner array to a sub-document whose field names encode data (day-of-month, day-of-quarter). MongoDB's Cost of Not Knowing Part 3 treats this as the natural next step after Bucket + Computed.
- Attribute Pattern — cousin: denormalize variable attributes
into an array of
{k, v}objects. Different axis — attribute heterogeneity, not temporal grouping — but shares the denormalize-into-nested-structure move.
Seen in¶
- sources/2025-10-09-mongodb-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4 —
baseline of the whole
appV5RX/appV6RXfamily. Part 2 (not yet ingested) introduced the Bucket + Computed combination; Part 3 builds the dynamic-schema variation on top. appV5R0 bucketed by year-in-_id+itemsarray; appV5R1 → appV5R4 varied the bucketing granularity (year / quarter / month) and the per-element aggregation (raw event vs computed totals). Quarter-bucketing with per-day computed totals (appV5R3) was the Part-2 winner — 33 M documents, 11.96 GB data, 1.11 GB index, 385 B avg document from 500 M events.