MongoDB¶
MongoDB Engineering blog (mongodb.com/blog). Tier-2 source on the sysdesign-wiki. MongoDB the company is the operator of MongoDB the document database, Atlas (managed cloud), and historically the Mobile SDK / Atlas Device Sync (sunset). The engineering blog mixes product posts, conference talks, and deeper infra content; the deeper infra content — especially around the MongoDB server's consensus / replication / storage internals, and around application-side migration patterns (PL/SQL → Java + Atlas) — is on-topic.
MongoDB the company is notable on the wiki's axis of specification-driven development and lightweight formal verification: the server team has a long practice of writing TLA+ specs of its own consensus and replication protocols, continues to produce peer-reviewed work on how to keep specs + implementations aligned (VLDB 2020 eXtreme Modelling in Practice; VLDB 2025 Design and Modular Verification of Distributed Transactions in MongoDB — with a 2026-02 follow-up blog Towards model-based verification of a key-value storage engine describing how the compositional structure of the transactions spec enables test-case generation against the WiredTiger storage interface at 87,143 tests / ~40 min scale for a 2×2 finite model), and actively sponsors academic research on conformance checking.
Separately, MongoDB publishes legacy-modernization playbooks for customers migrating off RDBMS + PL/SQL stacks — most visibly the 2025-09-18 batch-optimization framework for insurance customers (bulk writes + prefetching + parallel execution + configurable batch sizes to recover from post-migration batch-job regressions).
MongoDB engineering leadership (2025-09-25 manifesto by Ashish Agrawal + Akshat Vig) articulates three principles as non-negotiable design constraints: resilience + intelligence + simplicity, explicit gate: "if a change widens blast radius, breaks adaptive performance, or adds operator toil, it doesn't ship." Concrete realisations span architectural isolation (Atlas dedicated clusters, anti-shared-wall stance), defense-in-depth security (five- question stack), TLA+- verified protocols (logless reconfiguration distilled to four invariants; multi-shard transactions), single-replica-set cross-cloud topologies, and Vector Search integrated into MQL as the query-engine-level remediation for the three-database problem.
Key systems¶
- systems/mongodb-server — the C++ document database; replica-set deployments use a Raft-like consensus for leader election + commit-point tracking.
- systems/wiredtiger — the B-tree-based storage engine MongoDB has used as default since 3.2 (2015); provides the on-disk format, the in-memory cache, MVCC document-level locking, per-collection compression (snappy default / zstd / zlib), and journal / checkpoint durability. The default ~1.5 GB cache on a 4 GB-RAM box is the specific resource the 2025-10-09 Cost of Not Knowing Part 3 load test made famous as the appV6R0 → appV6R1 pivot driver.
- systems/mongodb-atlas — MongoDB's managed cloud database service
across AWS/GCP/Azure; deployment target for modernization
migrations; host of the
bulkWrite+ MongoDB 8 multi-collection bulk transactions primitives underpinning patterns/bulk-write-batch-optimization; canonical target of the five-phase managed-service migration playbook; the platform whose dedicated-cluster isolation - cross-cloud single replica set + Global Clusters / zone sharding realize the 2025-09-25 manifesto's "resilience + simplicity" pillars.
- systems/atlas-vector-search — MongoDB's native vector- similarity-search capability integrated directly into the MongoDB query engine (same MQL, same drivers); positioned as the query-engine-level answer to the three-database problem; Voyage AI embedding + reranking integration ongoing.
- systems/atlas-hybrid-search — MongoDB's native hybrid-search functions (GA on Atlas, public preview on Community Edition + Enterprise Server) composing Atlas Search (BM25 on Lucene) + Atlas Vector Search into a single MQL aggregation-pipeline primitive with engine-side RRF / RSF fusion; 2025-09-30 release and the canonical in-wiki instance of the native hybrid-search function productization trend.
- systems/atlas-queryable-encryption — MongoDB-Research-developed industry-first searchable encryption; equality + range queries on server-encrypted fields with client-held keys; the "what if someone gets in?" layer of MongoDB's defense-in-depth stance.
- systems/atlas-global-clusters — Atlas's
zone-sharding feature built on
the core sharding system; declarative data-residency (e.g.
DE, FR, ES → EU_Zone) for GDPR + low-latency-per-user without app rewrites. - systems/mongodb-search-nodes — dedicated Atlas compute tier for search + vector-search workloads, scaled independently of database nodes; the leading edge of MongoDB's broader compute/storage decoupling direction for Atlas.
- systems/mongodb-application-modernization-platform — MongoDB's agentic tooling + AI-based frameworks for migrating legacy RDBMS applications onto Atlas + microservices at scale; strategic bet on breaking the bodies-per-project scaling of migration engagements. Insurance-sector case (2025-09-25): 1:1 schema translation performed competitively with the legacy RDBMS + multi-hour stored-procedure batch jobs completed in <5 minutes on MongoDB thanks to native parallelism.
- systems/mongodb-community-edition — MongoDB's free self-managed distribution; canonical source side of self-managed → managed migrations (BharatPE: 45 TB / 3-sharded-cluster → Atlas, 2025-09-21).
- systems/mongosync — MongoDB's continuous-replication migration tool; powers the Migrate phase of Community Edition → Atlas moves.
- systems/arangodb — named in the Cars24 case study (2025-10-12) as the legacy multi-model database Cars24 left for MongoDB Atlas on its geospatial workload; reported ~50% cost reduction post-migration. MongoDB-authored framing treats ArangoDB's transaction model + ecosystem size + performance fit as the three limitations; not a neutral technical assessment of ArangoDB.
- systems/mongodb-mobile-sdk — mobile-local document store + sync protocol to a central server; conflict resolution via Operational Transform; since sunset as Atlas Device Sync.
- systems/tla-plus — Lamport's specification language; MongoDB engineers have written specs of the server's consensus protocol, WiredTiger storage layer, transactions, and other components.
- systems/tlc-model-checker — TLA+'s finite-state model-checker; used both to verify MongoDB specs and, in the sources/2025-06-02-mongodb-conformance-checking-at-mongodb-testing-that-our-code-matches-our-tla-specs|2020 conformance-checking experiment, as the oracle for trace-checking and test-case generation.
- systems/raft-mongo-tla — the MongoDB-specific TLA+ spec of the commit-point-propagation protocol.
- systems/voyage-ai — MongoDB's embedding + reranking model line (acquired 2025). Powers native embedding generation + cross- encoder reranking for Atlas Vector Search + Atlas Hybrid Search. Documented serving-infra post (2025-12-18) reveals a production vLLM + Redis-Lua-batcher pipeline for the query side of embedding inference, delivering 50 % GPU-inference-latency reduction with 3× fewer GPUs on voyage-3-large.
- systems/voyage-3-large — flagship embedding model in the voyage-3 family; canonical target of Voyage AI's token-count- batching serving stack.
- systems/vllm — production inference engine Voyage AI standardised on (replaced HF Inference). Padding removal is the load-bearing primitive that enables token-count batching.
Key patterns / concepts¶
- concepts/predictive-autoscaling — MongoDB Atlas's new scale-up-only predictive auto-scaler (November 2025 rollout) complementing the existing reactive scaler. Three-component Forecaster + Estimator + Planner pipeline; 2023 research prototype retrospective documented 2026-04-07.
- concepts/reactive-autoscaling — the pre-2025 Atlas auto-scaler baseline: reaction-latency + scaling-op-latency
- one-tier-at-a-time constraint. Retained post-2025 as scale-down default + scale-up backstop.
- concepts/customer-driven-metrics — the exogenous-input class (QPS, client connections, scanned-objects rate) MongoDB forecasts to avoid circular forecasts. Named directly in the 2026-04-07 post.
- concepts/self-invalidating-forecast — the hazard MongoDB names as "circular dependency" of predicting CPU directly (forecast → scale → CPU flattens → forecast wrong). Remediation: forecast exogenous customer-driven metrics instead.
- concepts/self-censoring-forecast — the long-term forecaster's recent-accuracy gate; withholds prediction when recently inaccurate, deferring to short-term forecaster or reactive backstop.
- concepts/seasonality-daily-weekly — workload property MSTL + ARIMA long-term forecaster exploits; ~25% of Atlas replica sets have weekly seasonality, most have daily.
- concepts/tier-based-instance-sizing — Atlas's discrete (M10 → M60) catalog; reactive scaling is one-tier-at-a-time, predictive can skip tiers.
- patterns/forecast-then-size-planner — canonical three-component predictive-capacity-planner pattern (Forecaster → Estimator → Planner) MongoDB introduced in its 2023 prototype.
- patterns/short-plus-long-term-forecaster — two-forecaster refinement handling seasonal + non-seasonal workloads under one selector; MongoDB Atlas's prototype is the canonical wiki instance.
- patterns/prototype-before-production — the 2023 predictive-scaling prototype → 2025 production code path is the "godparent, not parent" (research-rewrite) variant of the pattern; the prototype's value was the learning, not the code.
- concepts/conformance-checking — the named gap: spec + implementation both exist, but how do you keep them in sync? Named in the 2020 paper and still live in 2025.
- concepts/temporal-logic-specification — the family TLA+ specs sit in; MongoDB's consensus safety/liveness properties are expressed as temporal-logic invariants.
- concepts/bisimulation — the "both directions" target of
conformance checking:
Bimpl ⊂ BspecANDBspec ⊂ Bimpl. - concepts/operational-transform — the conflict-resolution algorithm the Mobile SDK used across client + server implementations.
- concepts/lightweight-formal-verification — MongoDB's work is in the same family as AWS ShardStore's; different realization (adjacent TLA+ spec vs. in-language executable spec).
- concepts/network-round-trip-cost — the force PL/SQL → Java + Atlas migrations pay on naïve application-layer loops; MongoDB's batch-optimization framework is built to collapse it.
- patterns/extreme-modelling — 2011 methodology MongoDB's 2020 paper applied in production: many small specs, written alongside the implementation, continuously conformance-checked.
- patterns/trace-checking — technique that failed at MongoDB in 2020 on the server (~10 weeks, couldn't snapshot multithreaded state cheaply; spec and impl were written years apart at mismatched abstraction).
- patterns/test-case-generation-from-spec — technique that succeeded at MongoDB in 2020 on the Mobile SDK's OT merge rules (4,913 tests, 100% branch coverage, one real infinite-recursion bug found).
- patterns/property-based-testing — peer technique that conformance checking competes with / complements.
- patterns/bulk-write-batch-optimization — MongoDB's named composite framework for recovering batch-job performance after PL/SQL → Java + Atlas migrations: bulk writes + intelligent prefetching + parallel processing + configurable batch sizes + index-strategy tuning.
- patterns/five-phase-managed-service-migration — MongoDB Professional Services' named playbook (Design → De-risk → Test → Migrate → Validate) for moving large self-managed Community Edition deployments onto Atlas without downtime; BharatPE 2025-09-21 is the canonical instance (45 TB, regulated fintech).
- concepts/shared-responsibility-model — the line that moves when a MongoDB customer leaves Community Edition for Atlas; managed service absorbs backups, HA/failover, audit logs, RBAC, VPC peering. Security-patching axis realised across the three MongoDB tiers (Atlas / Enterprise Advanced / Community Edition) as three distinct responsibility positions; canonical datapoint in CVE-2025-14847 where Atlas absorbs patching velocity entirely and the self-hosted tiers receive the patch at public CVE time or shortly after.
- concepts/fleet-patching — operational capability MongoDB exercises on the Atlas fleet; CVE-2025-14847 datapoint: "tens of thousands of customers / hundreds of thousands of instances" patched in ~6 days between internal detection and full-fleet completion, ahead of public CVE disclosure. Canonical industry reference datapoint (order-of-magnitude, not reproducible).
- concepts/maintenance-window — Atlas's per-cluster customer-configurable update window; for routine updates the contract binds vendor to the window, for urgent security patches the escape hatch is pre-notification, not silent override (MongoDB's "established policy", as seen on CVE-2025-14847: ~15 h pre-notification 2025-12-17 21:00 for 2025-12-18 forced patch).
- concepts/coordinated-disclosure — industry norm MongoDB implements in its vendor-first-patch variant: patch the managed-fleet first, disclose CVE + notify self-hosted tiers after. CVE-2025-14847's 7-day detection-to-disclosure window + 1-day post-patch CVE publication is the canonical wiki instance.
- concepts/nosql-database — the category MongoDB pioneered the document-store branch of; historically miscategorised as uniformly eventually-consistent (AP) when MongoDB was always CP. The 2025-09-25 "From Niche NoSQL To Enterprise Powerhouse" post is MongoDB's own narrative correction of the decade-long categorisation.
- concepts/tunable-consistency — per-operation read + write concerns that let one MongoDB database host both "page view count" (loose) and "order checkout value" (strict) workloads; the consistency-axis answer to the three-database problem.
- concepts/multi-document-acid-transactions — MongoDB 4.0 (2018) milestone later extended to sharded clusters; "arguably the single most important development in [MongoDB's] history" because it "shattered the biggest barrier to adoption for transactional applications." Canonical example: financial transfer between two accounts.
- concepts/compute-storage-separation — the architectural direction Search Nodes lead with in Atlas; stated vision is to extend to more of high-throughput query workloads.
- concepts/logless-reconfiguration — MongoDB's TLA+-verified membership-change protocol that decouples config changes from the data replication log; correctness distilled to four invariants (terms, monotonic versions, majority votes, oplog-commit rule); VLDB arXiv:2102.11960.
- concepts/defense-in-depth — MongoDB's layered-security stance articulated as five questions (authn / authz / encryption / network / audit); the anti-wall framing in the 2025-09-25 manifesto.
- concepts/architectural-isolation — dedicated-cluster multi-tenant isolation as a deliberate anti-shared-wall posture; structural containment of blast radius + noisy neighbors.
- concepts/cross-cloud-architecture — single replica set spanning AWS / GCP / Azure as a product feature, not a DIY integration; intelligent drivers handle cross-cloud failover transparently.
- concepts/three-database-problem — the anti-pattern MongoDB Vector Search addresses at the query-engine level.
- concepts/hybrid-retrieval-bm25-vectors — generic concept of BM25 + dense-vector retrieval fusion; MongoDB's native hybrid-search functions productize it as a single MQL aggregation-pipeline primitive.
- concepts/reciprocal-rank-fusion — one of the two standard fusion algorithms MongoDB's 2025-09-30 post identifies (rank-based, no normalization needed).
- concepts/relative-score-fusion — the other standard fusion algorithm (score-based, per-retriever normalization).
- concepts/sparse-vector — vector-first platforms' bridging primitive to lexical; contrasted in MongoDB's 2025-09-30 positioning against its own inverted-index-based lexical stance.
- concepts/cross-encoder-reranking — re-ranking layer above hybrid retrieval; implicit direction for MongoDB via the Voyage AI acquisition.
- patterns/formal-methods-before-shipping — MongoDB's stated discipline: model the protocol in TLA+, run the checker against every interleaving, distil to invariants, then implement. Logless reconfiguration + multi-shard transactions are the published instances.
- patterns/weekly-operational-review — MongoDB's weekly cross-team forum — engineers + on-calls + PMs + leaders — for reviewing failures and celebrating small wins so learnings compound across the fleet.
- patterns/native-hybrid-search-function — MongoDB's 2025-09-30 release of native hybrid-search functions is the canonical MongoDB-branded instance of this productization pattern; industry-wide convergence trend (Elasticsearch, OpenSearch, Weaviate, Qdrant, Pinecone all ship similar primitives).
- patterns/separate-vs-combined-index — the architectural trade-off MongoDB navigates with separate-indexes-unified-MQL-surface; contrasts with vector-first platforms' single-combined-index approach.
- concepts/bucket-pattern — MongoDB's named schema-design pattern for collapsing many fine-grained events into a single time-windowed bucket document; the baseline Part-2 / Part-3 of the 2025-10-09 Cost of Not Knowing series iterate on.
- concepts/computed-pattern — MongoDB's named pattern for pre-aggregating values at write time so reads become field lookups; loads bearing on appV5R3 (quarterly-bucketed + per-day pre-aggregated status totals) + the entire appV6RX family that builds on it.
- concepts/dynamic-schema-field-encoding — the novel concept
surfaced in Cost of Not Knowing Part 3: field names encode
information (day-of-month, day-of-quarter) so the outer
_id+ field-name pair reconstitutes the full date without repeating year/month in every element. 67.5 % document-size reduction in the canonical case. - concepts/wiredtiger-cache — the specific MongoDB-server memory budget whose default ~1.5 GB ceiling on a 4 GB-RAM box drove the appV6R0 → appV6R1 pivot. Canonical illustration of "index > cache is a different bottleneck than document > disk".
- concepts/bson-document-overhead — why per-document fixed
costs (length prefix,
_id, trailing null) and per-field costs (type tag, name, null terminator) mean field-name shortening / bucketing / dynamic-schema encoding all move real gigabytes at 500 M-event scale. - concepts/aggregation-pipeline — MongoDB's declarative
server-side query framework (ordered
$match/$addFields/$group/$projectstages); the read-path surface where the dynamic-schema pattern pays its storage-wins back as$objectToArray+$reduceCPU per matched document. - concepts/document-storage-compression — WiredTiger's per-collection block compressor (snappy / zstd / zlib / none); explicitly named in Cost of Not Knowing Part 3's intro as a Part-3 lever alongside dynamic schema.
- concepts/disk-throughput-bottleneck — the bottleneck class Cost of Not Knowing Part 2's appV5R4 finished on and Part 3 attacks with dynamic-schema document shrinking.
- concepts/working-set-memory — the generalized concept above WiredTiger cache: the hot subset of data + indexes the workload needs in memory for steady-state performance. MongoDB's cache is the operational realization.
- patterns/dynamic-schema-field-name-encoding — MongoDB- canonical storage-compression pattern where a bounded- cardinality discriminator is promoted from a value position to a field-name position inside a sub-document, eliminating per-element BSON overhead. Canonical wiki instance: Cost of Not Knowing Part 3 appV6R1, 31.4 % smaller documents + 28.1 % smaller per-event total footprint vs appV5R3.
- patterns/schema-iteration-via-load-testing — methodology pattern the entire 3-part Cost of Not Knowing series embodies: candidate schema → fixed-envelope load test → observe saturated resource → next candidate targets the observed bottleneck. MongoDB case study's appV6R0 → appV6R1 pivot is the clearest illustration — the load test, not the architect, chose the move.
- concepts/synchronization-tax — the ongoing-cost class named in the 2025-10-12 Cars24 post ("Avoid 'synchronization tax'") for the pipeline / race-logic / operational-doubling overhead of a primary RDBMS + bolt-on search engine. MongoDB's framing positions Atlas + Atlas Search as the structural remediation; the wiki adopts the name while noting the vendor bias.
- patterns/consolidate-database-and-search — the pattern collapsing (primary DB + bolt-on search engine + sync pipeline) into (one substrate with embedded search on the same cluster). Cars24's Postgres + bolt-on-search → Atlas + Atlas Search (2025-10-12) is the canonical wiki instance. Extends to the ArangoDB-for-geospatial → Atlas consolidation in the same article (specialized DB → general-purpose primary, same shape).
- concepts/token-count-based-batching — scheduler-side primitive pioneered at Voyage AI by MongoDB: admit short embedding requests into a GPU forward pass up to a token-count budget (the model- hardware-engine-specific saturation point, ~600 tokens on voyage-3 / A100) rather than by request count or time window. 50 % GPU-inference latency / 3× fewer GPUs / up to 8× throughput on voyage-3-large.
- concepts/padding-removal-inference — inference-engine
primitive in vLLM / SGLang
that concatenates variable-length sequences into a super-sequence
of length
Σ token_count_i. Load-bearing prerequisite for token-count batching. - concepts/memory-bound-vs-compute-bound — the regime taxonomy naming why query embedding inference wastes GPU at low batch size (memory-bound) and what batching moves it toward (compute- bound, near the saturation point where MFU and throughput peak).
- concepts/saturation-point-inference — the model-and-hardware- specific token-count threshold where inference latency transitions from flat (memory-bound fixed overheads) to linear (compute-bound).
- concepts/model-flops-utilization — efficiency metric Voyage AI profiles alongside throughput; rises near-linearly with token count until saturation. Short-request sequential serving leaves MFU unrecoverably low.
- concepts/query-vs-document-embedding — Voyage AI's explicit distinction between short-low-latency queries and long-batch- ingested documents as two different serving problems with different optimal batching regimes.
- concepts/spiky-traffic — the arrival pattern Voyage AI names as the design constraint that rules out autoscaling as the primary mitigation ("autoscaling is too slow") and makes batching the tail-stabilizer.
- patterns/atomic-conditional-batch-claim — queue-design pattern Voyage AI realises via Redis + Lua: pop items atomically until a running budget is reached, setting per-item TTLs in the same call. Canonical queue-substrate shape for token-count batching.
- patterns/lightweight-aggregator-in-front-of-broker — the named alternative path Voyage AI articulates for teams stuck on Kafka / RabbitMQ: application-batching logic in an aggregator tier, broker keeps durability / fan-out / delivery.
- patterns/rapid-fleet-patching-via-managed-service — MongoDB uses Atlas's vendor-operated deployment substrate as the primary security-remediation channel; CVE-2025-14847 patched across the full Atlas fleet in ~6 days of internal detection, preserving maintenance-window courtesy contracts via pre-notification override. Three-tier rollout with Community Edition + Enterprise Advanced behind on the responsibility ladder.
- patterns/pre-disclosure-patch-rollout — MongoDB's vendor-first-patch variant of coordinated disclosure for internally-discovered vulnerabilities: detection → patch Atlas fleet → publish CVE + notify self-hosted tiers. 7-day detection-to-CVE window for CVE-2025-14847; CVE published 1 day after Atlas fleet patching completed.
Recent articles¶
-
2026-05-11 — sources/2026-05-11-mongodb-fighting-tool-sprawl-the-case-for-ai-tool-registries (MongoDB Engineering / Technical position post arguing that every enterprise running AI agents at non-trivial scale needs an organisation-internal AI tool registry. Frames tool sprawl as the predictable application-layer outcome of treating an infrastructure problem as a discipline problem — "fragmented tool development is not a consequence of poor engineering practice. Rather, it is the predictable outcome of asking teams to solve an infrastructure problem at the application layer." The post enumerates four registry functions: discovery (with hierarchical grouping by functional domain), versioning, certification metadata (surfaced, not enforced — see concepts/certification-as-metadata-not-enforcement), and access control (delegated to a separate policy layer). Two pillars of value: reducing coordination cost + enabling risk management. Architecturally positioned as "an Internal Developer Portal (IDP) built for the agent era, solving the same coordination problem that IDPs solved for service teams, but one layer up" — making Backstage the canonical IDP-shape ancestor for the patterns/idp-extended-to-ai-agent-tools pattern. Cites Kong's enterprise MCP Registry launch (February 2026) and Gravitee's State of AI Agent Security 2026 survey (14.4% full security approval, 88% had agent incidents, 22% treat agents as independent identities) as market + risk evidence. Articulates the registry-as- governance-precondition argument: "a registry does not make tools inherently secure; rather, it makes security work possible by ensuring tools exist not as transitory, ad hoc shims, but rather as inventoried artifacts that audits and policy can attach to." And the deny-by-default vs allow-by-default architectural posture: "allow-by-default, replicated across dozens of independent agent deployments, produces an attack surface that scales with adoption." Anti-prescription: explicitly rules out a public package- manager-style AI-tool registry as "premature standardization in a fast-moving, nascent space." Position-piece, not architecture-of-MongoDB's-own-registry. Canonical wiki source for patterns/enterprise-ai-tool-registry + patterns/idp-extended-to-ai-agent-tools + four new concepts. First MongoDB ingest on the AI-agent-platform / agentic- infrastructure axis on this wiki — sibling to the broader agent-governance corpus from Databricks Unity AI Gateway (concepts/coding-agent-sprawl at the client layer) and Pinterest's MCP ecosystem (concepts/mcp-registry at the protocol-scoped layer).)
-
2026-04-07 — sources/2026-04-07-mongodb-predictive-auto-scaling-an-experiment (Research retrospective introducing MongoDB Atlas's new predictive auto-scaling capability — rollout started November 2025, scale-up-only, complementing the existing reactive auto-scaler. Walks through the 2023 research prototype on 10,000 replica sets whose learnings shaped — not directly productised — the shipped algorithm. Three-component architecture: MSTL + ARIMA long-term Forecaster over customer-driven metrics (QPS, connection count, scanned-objects rate — exogenous inputs to avoid the circular forecast hazard of predicting CPU), boosted- decision-tree Estimator on 25 M samples, Planner selecting cheapest tier under 75 % CPU over next 15 minutes. Two-forecaster refinement with self-censoring gate on the long-term forecaster and short-term trend-interpolation fallback for non-seasonal replica sets. Headline numbers: ~25 % of replica sets weekly-seasonal, most daily; short-term beats naïve last-observation 68 % of the time with 29 % error reduction; Estimator <7 % CPU error on ~45 %, ~13 % excluded from predictive scaling; average savings ~9¢/hour/replica-set, extrapolating to millions/year at fleet-wide adoption. Canonical wiki instance of prototype-before- production's research-rewrite variant: "the experiment is more of a godparent to the product, rather than its parent.")
-
2026-02-27 — sources/2026-02-27-mongodb-towards-model-based-verification-of-a-key-value-storage-engine (Part 2 of the VLDB 2025 series on formal methods before shipping MongoDB's cross-shard transaction protocol. The compositional structure of the team's TLA+ spec has a clean interface boundary to WiredTiger extracted as its own
Storage.tlamodule. A modified TLC enumerates its complete reachable state graph; path coverings over that graph emit one test per path as a WiredTiger-API call sequence (patterns/test-case-generation-from-spec). Reported scale: 2 keys × 2 transactions → 87,143 tests generated + executed against WiredTiger in ~40 minutes — second production MongoDB instance of this pattern after the 2020 Mobile-SDK OT experiment, and an order of magnitude larger. Specs + generator open-sourced at mongodb-labs/vldb25-dist-txns; full paper p5045-schultz.pdf. Points at LLM autoformalization as speculative future direction. Companion to the 2025-09-25 Carrying complexity manifesto — provides the depth behind its multi-shard-transactions bullet.) -
2025-12-30 — sources/2025-12-30-mongodb-server-security-update-december-2025 (Trust-layer retrospective from MongoDB CTO Jim Scharf on CVE-2025-14847 ("Mongobleed") in the MongoDB Server. Internally discovered by MongoDB Security Engineering on 2025-12-12 19:00 ET, patched across the Atlas fleet of "tens of thousands of customers / hundreds of thousands of instances" within ~6 days — majority done 2025-12-17, remainder (including maintenance-window customers, given ~15 h of pre-notification) 2025-12-18. Public CVE published 2025-12-19, one day after Atlas remediation complete — canonical vendor-first-patch variant of coordinated disclosure. Three-tier rollout maps to three shared-responsibility line positions: Atlas (MongoDB-driven), Enterprise Advanced (customer-driven via patched builds), Community Edition (customer-driven via community forum post 2025-12-23). Post is explicit trust-layer artefact — publishes the timeline + decisions; technical vulnerability class / severity / exploit complexity are in the CVE record, not the post. Introduces the fleet-patching concept to the wiki as a first-class operational capability with MongoDB's ~6-day Atlas figure as the canonical datapoint, and the rapid fleet- patching via managed service pattern that depends on it. Caveats: no technical vulnerability detail in the post; order-of-magnitude fleet numbers only (no per-region / per- cluster-size breakdown); no rollback / failure statistics; no independent severity / exploit analysis referenced.)
-
2025-12-18 — sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference (Voyage AI by MongoDB engineering post documenting the production query-embedding-inference stack: vLLM + padding removal + token-count-based batching to a model-and-hardware-specific saturation point (~600 tokens for voyage-3 on A100), with Redis-Lua- atomic-script as the queue substrate (patterns/atomic-conditional-batch-claim). Replaces the old no-batching + HF Inference pipeline. Headline production result on voyage-3-large: 50 % GPU-inference- latency reduction, 3× fewer GPUs; across 7+ onboarded models up to ~20 ms GPU-inference-time drop via vLLM alone, up to 8× throughput via token-count batching, P90 end-to-end latency drops by 60+ ms on some model servers, P90 more stable during traffic spikes even with fewer GPUs. Names the queue-design primitive (peek + atomic claim up to budget) and why general-purpose brokers — RabbitMQ prefetch-as-request-count + push-delivery, Kafka byte-based partition batching — don't satisfy it, articulating the patterns/lightweight-aggregator-in-front-of-broker fallback for teams that need the broker semantics. First MongoDB-side source on the wiki with infrastructure-level embedding-inference detail; first wiki source on padding removal, token-count batching, saturation point, and vLLM / SGLang / Voyage AI.)
-
2025-10-12 — sources/2025-10-12-mongodb-cars24-improves-search-for-300-million-users-with-atlas (MongoDB-Blog case study of Cars24 — Indian online car marketplace, 300 M users globally. Two migrations: (1) Postgres + bolt-on Elasticsearch-class search index (sync pipelines maintained across multiple engineering teams, "race logic" for data-sync consistency, operational overhead of two systems) → MongoDB Atlas on Google Cloud + Atlas Search (BM25 on Lucene) on the same cluster, eliminating the synchronization tax as a structural property. Canonical wiki instance of patterns/consolidate-database-and-search. (2) ArangoDB-based geospatial search → Atlas, reporting ~50% cost savings. MongoDB-named ArangoDB limitations: performance bottlenecks, weak transactions, limited ecosystem; Cars24's business-unit headcount scaled from <10 → triple-digit in ~1 year, making developer-ecosystem depth an explicit architectural constraint. Marketing-framed case study; no published architecture diagrams, no sync-pipeline-throughput numbers, no ArangoDB-side performance numbers, no Atlas cluster topology. The 50% cost is an aggregate customer-reported figure, not decomposed into compute/storage/license/headcount. Value: the named cost class ("synchronization tax") and the named pattern family (consolidate-database-and-search).)
- 2025-10-09 — sources/2025-10-09-mongodb-cost-of-not-knowing-mongodb-part-3-appv6r0-to-appv6r4
(MongoDB-authored third installment of The Cost of Not Knowing
MongoDB — a senior-developer-written schema-tuning case study
on a 500 M-event counter application serving five date-range
report aggregations on a deliberately-constrained 4 GB-RAM /
1.5 GB-WiredTiger-cache rig. Core Part-3 technique:
dynamic-schema
field-name encoding — promote the day-of-bucket discriminator
from a value position (
items[{date: 2022-06-05, ...}]) to a field-name position (items: {"0605": {...}}) inside the sub-document, eliminating per-element BSON overhead for the"date"field name. Starting from Part 2's winner appV5R3 (quarterly-bucketed + per-day pre-aggregated, 385 B/doc, 33.4 M docs, 11.96 GB data, 1.11 GB index), Part 3 iterates through appV6R0 (monthly-bucketed + DD-keyed dynamic items; 125 B/doc, 67.5 % smaller documents; but 3.13 GB_idindex overflows the 1.5 GB WiredTiger cache, collapsing the expected disk- throughput win) → appV6R1 (quarterly-bucketed + MMDD-keyed dynamic items; 264 B/doc, 1.22 GB index fits cache; 17.6 B data/event, 20.2 B total/event — 28.1 % total-size-per-event reduction vs appV5R3). Reads pay$objectToArray+$reduceper matched document — a compute-for-storage trade-off. The appV6R0 → appV6R1 pivot is the canonical wiki instance of patterns/schema-iteration-via-load-testing: shrinking documents didn't automatically improve throughput because the bottleneck shape migrated from disk throughput to index-in-cache; next iteration targets the new constraint. Caveats: the captured raw truncates mid-appV6R2 — appV6R3 / R4 and the intro-promised compression-algorithm lever are absent; rig-dependent conclusions (quarter-bucketing wins on 4 GB RAM) don't transfer to production hardware without re-testing.) - 2025-09-30 — sources/2025-09-30-mongodb-top-considerations-when-choosing-a-hybrid-search-solution (MongoDB Blog technical post, 2025-09-30. Industry-evolution survey + buyer's guide for hybrid search (lexical + vector). Names the 2022–2023 inflection when pure-vector retrieval proved insufficient — "full-text search will always remain useful for identifying tokens outside the training corpus of an embedding model" — driving the industry to combine both. Identifies two standard fusion techniques: RRF (rank-position-based, no normalization needed) and RSF (raw-score-based with per-retriever normalization). Taxonomizes vendors as lexical-first (MongoDB, Elasticsearch, OpenSearch, Solr — separate indexes, BM25 on inverted index + vectors in a second index) vs vector-first (Pinecone, Weaviate, Milvus, Qdrant — combined index, lexical via sparse vectors). MongoDB's competitive framing: "lexical-first systems tend to offer stronger keyword capabilities", especially when lexical requirements are advanced. Identifies the 2025 industry convergence toward native hybrid-search functions — vendors shipping out-of-box hybrid-search primitives instead of leaving score combination to application code. MongoDB's own productization: native hybrid search functions GA on Atlas + public preview on Community Edition + Enterprise Server (systems/atlas-hybrid-search), composing Atlas Search (BM25 on Lucene) + Atlas Vector Search in one MQL aggregation-pipeline stage with engine-side fusion. Names cross-encoders, learning-to-rank models, and dynamic scoring profiles as the emerging re-ranking layer above hybrid retrieval — implicit Voyage AI integration direction. Buyer's-guide framing: "the decision [separate vs combined index] is mainly a trade-off between control and simplicity." Wiki treats the lexical-first-vs- vector-first positioning as MongoDB-side framing — neutral reading is that the boundary is blurring (ELSER on Elasticsearch, hybrid-native Pinecone). On-topic (tier-2 hybrid-search architectural content + named industry primitives).)
- 2025-09-25 — sources/2025-09-25-mongodb-from-niche-nosql-to-enterprise-powerhouse (MongoDB Engineering blog by Ashish Agrawal — historical-narrative companion to the same-day "Carrying Complexity, Delivering Agility" manifesto. Frames MongoDB's arc from niche NoSQL database to system-of-record through four enterprise-readiness pillars — HA replica sets + Raft-style consensus, horizontal + zone sharding, tunable consistency (per-operation read/write concerns), and multi-document ACID transactions in MongoDB 4.0 (2018) extended to sharded clusters ("arguably the single most important development in [MongoDB's] history"). Core thesis: MongoDB was designed CP but got lumped in with the AP-NoSQL default for a decade; adoption numbers (70%+ Fortune 500, 7/10 largest banks, 14/15 largest healthcare, 9/10 largest manufacturers) are pitched as the empirical refutation. Introduces two new Atlas-tier surfaces — Search Nodes (independent search / vector compute; leading edge of broader compute/storage decoupling in Atlas) and AMP (agentic tooling + AI frameworks for legacy-RDBMS → Atlas modernization; insurance case: 1:1 schema translation competitive with legacy, multi-hour stored procedures → <5 minutes on MongoDB via native parallelism). Customer instances named: Citizens Bank (fraud detection), Bosch IoT Insights, Coinbase. Adds prefix / suffix / substring queries to Queryable Encryption beyond original equality + range. Names Voyage AI (acquired 2025) embedding-generation-as-a-service as exploratory. Compliance: ISO/IEC 27001, SOC 2 Type II, PCI DSS, HIPAA. Value is the historical framing + two new named primitives.)
- 2025-09-25 — sources/2025-09-25-mongodb-carrying-complexity-delivering-agility (Engineering-leadership manifesto by Ashish Agrawal + Akshat Vig framing MongoDB's work as driven by three principles treated as non-negotiable design constraints: resilience, intelligence, simplicity. Gating rule: "if a change widens blast radius, breaks adaptive performance, or adds operator toil, it doesn't ship." Grounded in concrete mechanisms — dedicated-cluster isolation ("you get the whole building"), defense in depth (SCRAM / AWS IAM, RBAC, Client-Side FLE + Queryable Encryption, IP access lists + private endpoints, granular auditing), logless reconfiguration TLA+-verified down to four invariants + VLDB 2025 multi-shard transaction spec, single replica set across AWS/GCP/Azure + Global Clusters / zone sharding for GDPR, Vector Search integrated into MQL as the query-engine-level fix for the three-database problem. Companion formal-methods framing to the 2025-06-02 conformance- checking post.)
- 2025-09-21 — sources/2025-09-21-mongodb-community-edition-to-atlas-a-migration-masterclass-with-bharatpe (MongoDB-authored case study: BharatPE — Indian fintech, ~₹12,000 crore/month UPI — migrated 45 TB across 3 sharded clusters (each 1 primary + 2 secondary) from self-hosted Community Edition to Atlas using MongoDB Professional Services' five-phase migration methodology (Design → De-risk → Test → Migrate → Validate); data moved via mongosync with in-transit encryption; post-migration Atlas provides 99.995% uptime SLA + BharatPE's self-reported 40% query-response-time improvement + auto-failover + audit logs + RBAC + VPC peering as product features. Introduces the infrastructure-migration counterpart to the application-layer migration framework from 2025-09-18.)
- 2025-09-18 — sources/2025-09-18-mongodb-modernizing-core-insurance-systems-breaking-the-batch-bottleneck
(Framework-level retrospective on post-migration batch-job
regressions at insurance customers migrating PL/SQL cores to
Java + MongoDB Atlas. Like-for-like migrations ran 25–30×
slower; MongoDB's batch-optimization framework — bulk writes via
native
bulkWrite+ MongoDB 8 multi-collection bulk transactions, intelligent prefetching of reference data, Disruptor-style parallel processing, configurable batch sizes — restored legacy parity and in several cases delivered 10–15× better performance than the legacy RDBMS. Framework-overview depth; no per-customer numbers.) - 2025-06-02 — sources/2025-06-02-mongodb-conformance-checking-at-mongodb-testing-that-our-code-matches-our-tla-specs (A. Jesse Jiryu Davis's 2025 retrospective on the 2020 VLDB paper eXtreme Modelling in Practice: trace-checking the Raft-like server protocol failed at ~10 weeks of effort; test-case generation from the Mobile SDK's OT spec succeeded with 4,913 tests / 100% branch coverage / 1 real bug found. Five-year academic-progress survey covering Mocket, SandTable, Multi-Grained Zookeeper, Confidential Consortium Framework, Finn Hackett's TLA+-to-Go.)
Skipped (logged)¶
- 2025-11-03 — MongoDB Announces Leadership Transition (raw/mongodb/2025-11-03-mongodb-announces-leadership-transition-b696f8b1.md) — CEO succession press release (Dev Ittycheria → Chirantan "CJ" Desai, effective 2025-11-10). Pure executive / HR / investor-relations content with zero architectural signal. Classic corporate-blog press post; out-of-scope per AGENTS.md "Company PR / acquisition news" bucket.