CONCEPT Cited by 1 source
Dynamic schema — field names as data¶
Definition¶
A document-database schema where field names are not pre-defined constants but carry information — e.g. the day-of-month, a user-ID, a status-code — so the structure of a document reveals its content without the field names appearing in the query schema.
The MongoDB Cost of Not Knowing Part 3 author describes it as "a Dynamic Schema, where field names encode information and are not predefined" and flags that "it isn't very common to see" outside of senior-engineer-designed schemas.
Contrast with a conventional schema where the set of field names
is small, fixed, and documented (e.g. {date, amount, status}):
in a dynamic schema the field-name domain is the same size as the
value domain it encodes.
When it makes sense¶
- Bounded key cardinality. Encoding day-of-month works because there are at most 31 values. Encoding full timestamps as field names would create unbounded-cardinality documents (bad for BSON traversal cost, bad for query planners).
- Within-document grouping with shared index predicate. The
encoding works when the outer
_id(or indexed field) is the coarse-grained filter and the field-name encodes the fine-grained discriminator. MongoDB's case study:_id = key + year + quarterfilters the coarse range;items["0605"]within each document is the fine slice. - Small-integer-value workloads. The pattern pairs best with the concepts/computed-pattern where the value at each encoded-field-name is a small document of pre-aggregated counters — WiredTiger's default snappy compression compounds on the repeated shape.
Why it saves space¶
- Eliminates repeated per-element headers. A conventional
items: [{date: ..., ...}, {date: ..., ...}]array pays BSON type + length + field-name overhead for everydatefield in every element. Promotingdateto the field-name position in a sub-document pays it once as a name, not once per element. - Reuses information already in
_id. If_idalready encodeskey + year + month, then storingdate: 2022-06-05inside each element repeats the2022-06part. Encoding just the day ("05") as the field name drops the repetition. - Measured reduction: appV6R0 went from appV5R3's 385 B avg document to 125 B — a 67.5 % per-document shrink at equivalent semantics (Source: sources/2025-10-09-mongodb-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4).
Costs¶
- Indexing loses granularity. You cannot index the inside of
itemsby its dynamic field name; compound indexes onitems.${DD}.adon't exist because${DD}isn't a known path. The entire bucket is either in the cache or not. - Read queries need
$objectToArray. MongoDB's aggregation pipeline can range-filter by the synthesized-from-field-name date only after converting the dynamic sub-document to an array of[key, value]pairs via$objectToArray, then$reduce-ing over it. Compute cost is linear in bucket density per document matched by the outer$match. - Query opacity to tooling. Compass /
mongoshshow the dynamic field names as literal values; users unfamiliar with the convention can't guess what"0605"means without reading the ingest code. - Aggregation framework arithmetic on strings. Reconstructing
Date(YYYY, MM, DD)inside a$reduceaccumulator means$substr/ slice / string-to-number conversion at query time — paying per-matched-document CPU to amortize per-event storage bytes. - Schema validation can't express it directly. JSON Schema
validation ergonomically enumerates known fields; catching
dynamic schemas requires
patternPropertiesor bypassing validation on theitemsfield.
Relationship to adjacent techniques¶
- concepts/computed-pattern + concepts/bucket-pattern — the standard duo the dynamic schema further compresses. Bucket groups events into time windows; Computed pre-aggregates inside the bucket; Dynamic schema compresses the pre-aggregated shape.
- Bit-packed encoding — cousin on a different axis: instead of field-name carrying information, a single binary field's bit positions do. Both trade query transparency for byte efficiency.
- EAV (entity-attribute-value) schemas — opposite move: EAV pushes the attribute name into the value column of a relational table; dynamic schema pushes it into the key name of a document sub-object. EAV is traditionally an anti-pattern in relational databases for the same reason dynamic schemas are tricky in MongoDB: indexing and query planning lose the attribute's name.
Seen in¶
- sources/2025-10-09-mongodb-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4 —
appV6R0 uses
items: { "05": {a,n,p,r}, "16": {...}, ... }with day-of-month as field name; appV6R1 switches toitems: { "0605": {a,n,p,r}, ... }with month-day as field name to support quarter-bucketing. Load-test finding: the 67.5 % document-size reduction did not translate to proportional throughput wins when the index-size-vs-cache became the new bottleneck.