Skip to content

CONCEPT Cited by 1 source

Dynamic schema — field names as data

Definition

A document-database schema where field names are not pre-defined constants but carry information — e.g. the day-of-month, a user-ID, a status-code — so the structure of a document reveals its content without the field names appearing in the query schema.

The MongoDB Cost of Not Knowing Part 3 author describes it as "a Dynamic Schema, where field names encode information and are not predefined" and flags that "it isn't very common to see" outside of senior-engineer-designed schemas.

Contrast with a conventional schema where the set of field names is small, fixed, and documented (e.g. {date, amount, status}): in a dynamic schema the field-name domain is the same size as the value domain it encodes.

When it makes sense

  • Bounded key cardinality. Encoding day-of-month works because there are at most 31 values. Encoding full timestamps as field names would create unbounded-cardinality documents (bad for BSON traversal cost, bad for query planners).
  • Within-document grouping with shared index predicate. The encoding works when the outer _id (or indexed field) is the coarse-grained filter and the field-name encodes the fine-grained discriminator. MongoDB's case study: _id = key + year + quarter filters the coarse range; items["0605"] within each document is the fine slice.
  • Small-integer-value workloads. The pattern pairs best with the concepts/computed-pattern where the value at each encoded-field-name is a small document of pre-aggregated counters — WiredTiger's default snappy compression compounds on the repeated shape.

Why it saves space

  • Eliminates repeated per-element headers. A conventional items: [{date: ..., ...}, {date: ..., ...}] array pays BSON type + length + field-name overhead for every date field in every element. Promoting date to the field-name position in a sub-document pays it once as a name, not once per element.
  • Reuses information already in _id. If _id already encodes key + year + month, then storing date: 2022-06-05 inside each element repeats the 2022-06 part. Encoding just the day ("05") as the field name drops the repetition.
  • Measured reduction: appV6R0 went from appV5R3's 385 B avg document to 125 B — a 67.5 % per-document shrink at equivalent semantics (Source: sources/2025-10-09-mongodb-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4).

Costs

  • Indexing loses granularity. You cannot index the inside of items by its dynamic field name; compound indexes on items.${DD}.a don't exist because ${DD} isn't a known path. The entire bucket is either in the cache or not.
  • Read queries need $objectToArray. MongoDB's aggregation pipeline can range-filter by the synthesized-from-field-name date only after converting the dynamic sub-document to an array of [key, value] pairs via $objectToArray, then $reduce-ing over it. Compute cost is linear in bucket density per document matched by the outer $match.
  • Query opacity to tooling. Compass / mongosh show the dynamic field names as literal values; users unfamiliar with the convention can't guess what "0605" means without reading the ingest code.
  • Aggregation framework arithmetic on strings. Reconstructing Date(YYYY, MM, DD) inside a $reduce accumulator means $substr / slice / string-to-number conversion at query time — paying per-matched-document CPU to amortize per-event storage bytes.
  • Schema validation can't express it directly. JSON Schema validation ergonomically enumerates known fields; catching dynamic schemas requires patternProperties or bypassing validation on the items field.

Relationship to adjacent techniques

  • concepts/computed-pattern + concepts/bucket-pattern — the standard duo the dynamic schema further compresses. Bucket groups events into time windows; Computed pre-aggregates inside the bucket; Dynamic schema compresses the pre-aggregated shape.
  • Bit-packed encoding — cousin on a different axis: instead of field-name carrying information, a single binary field's bit positions do. Both trade query transparency for byte efficiency.
  • EAV (entity-attribute-value) schemas — opposite move: EAV pushes the attribute name into the value column of a relational table; dynamic schema pushes it into the key name of a document sub-object. EAV is traditionally an anti-pattern in relational databases for the same reason dynamic schemas are tricky in MongoDB: indexing and query planning lose the attribute's name.

Seen in

  • sources/2025-10-09-mongodb-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4 — appV6R0 uses items: { "05": {a,n,p,r}, "16": {...}, ... } with day-of-month as field name; appV6R1 switches to items: { "0605": {a,n,p,r}, ... } with month-day as field name to support quarter-bucketing. Load-test finding: the 67.5 % document-size reduction did not translate to proportional throughput wins when the index-size-vs-cache became the new bottleneck.
Last updated · 200 distilled / 1,178 read