Build AI Agents Worth Keeping: The Canvas Framework¶
Summary¶
MongoDB-Blog thought-leadership post diagnosing why so many enterprise AI agent projects stall in pilot and prescribing a structured design flow to exit that trap. The post opens with secondary research (McKinsey, MIT NANDA, Gartner, IBM, CMU TheAgentCompany, SailPoint, PagerDuty) to name six failure modes — technology-first bias, capability-reality gap, leadership vacuum, governance paralysis, infrastructure chaos ("three-database problem"), ROI mirage — then introduces the product → agent → data → model design flow as the organising principle. The core reusable architectural content is the canvas framework (POC canvas = 8 squares / Production canvas = 11 squares) and the unified data platform answer to managing application data + vectors + agent memory on a single substrate — positioned as MongoDB Atlas with Voyage AI embeddings for domain-specific retrieval.
The post is vendor-authored and product-promotional in framing — the prescription terminates at a downloadable canvas template + Atlas + Voyage AI CTAs — but the three-database anti-pattern diagnosis and the design-flow ordering (agent design between product and data) are substantive enough to carry into the wiki; both are absent from prior ingests.
Key takeaways¶
-
"Three-database problem" as named failure pattern. The post names the infrastructure-chaos failure mode concretely: teams end up running "one [database] for operational data, another for vector data and workloads, a third for conversation memory — each with different APIs and scaling characteristics. This complexity kills momentum before agents can actually prove value." Named on this wiki as concepts/three-database-problem.
-
Agents need three storage roles: app DB + vector store + memory store. The post decomposes agent data needs into three first-class classes:
- Application database — business data, user profiles, transaction history.
- Vector store — semantic search, knowledge retrieval, RAG.
-
Memory store — agent context, conversation history, learned behaviours. The third class (concepts/agent-memory-store) is the one most systems-design writing under-specifies; MIT-cited stat: "66% of executives want systems that learn from feedback… 63% demand context retention. The dividing line between AI and human preference is memory, adaptability, and learning capability."
-
Unified data platform as remediation, not elegance-seeking. The post's answer to the three-database problem is a single substrate providing all three capabilities — "flexible document storage for application data, native vector search for semantic retrieval, and rich querying for memory management — all in a single platform." Canonical in-post instance: MongoDB Atlas. Named on this wiki as patterns/unified-data-platform-for-ai-agents.
-
Design flow: product → agent → data → model. The post contrasts three epochs of ML delivery:
- Old (pre-foundation-model ML):
data → model → product— months of dataset curation + from-scratch training before a product could ship. - Foundation-model generation:
product → data → model— Swyx's "Rise of the AI Engineer" shape; start from the user problem, identify data needed, pick a pre-trained model. -
Agentic generation (new):
product → agent → data → model— the agent layer sits between product and data because it determines what data + tools are needed to solve the problem. Inserting the agent step before data is the canvas framework's load-bearing claim. Named on this wiki as patterns/product-agent-data-model-design-flow. -
Canvas framework is two frameworks, not one.
- POC canvas (8 squares) for validation — four phases: Product validation, Agent design, Data requirements, External model integration. Bias is toward killing bad ideas before engineering investment.
-
Production canvas (11 squares) for scaling — five phases: Product + scale planning, Agent architecture, Data infrastructure, Model operations, Hardening and operations. The production canvas adds scaling concerns (fault tolerance, cost, governance) that the POC canvas deliberately skips. Post frames this as "business model canvas for AI agents".
-
External-model selection, not training. The post explicitly reframes the "Model" phase as selection + prompt engineering + API orchestration, not deployment: "With external model providers, the 'model' phase is really about selection and integration rather than deployment. Teams choose which provider's models best handle their data and use case, then build the orchestration layer to manage API calls, handle failures, and optimize costs." This is the concepts/training-serving-boundary collapsing to zero for most teams — no own-training, no own-serving, just orchestration against external providers.
-
Specialized embeddings > general-purpose for domain recall. Named concrete recommendation: "For embeddings and search relevance, specialized models like Voyage AI can provide domain-specific understanding, particularly for technical documentation where general-purpose embeddings fall short." Puts Voyage AI in the embedding-model landscape as the "domain-specialised" alternative to general-purpose text embeddings (OpenAI, Titan, etc.).
-
Failure-rate secondary stats (context, not architecture).
- "<10% of gen-AI use cases deployed ever make it past the pilot stage" (McKinsey).
- MIT NANDA finds failure rates "as high as 95%" in its 52-org sample (pg.3); only 5% of evaluated AI tools reach production (pg.6).
- 92% believe governance is essential, only 44% have policies; 80% experienced AI acting outside intended boundaries (SailPoint).
- 62% expect >100% ROI; ~80% report no material earnings impact from gen-AI (PagerDuty / McKinsey).
- CMU TheAgentCompany: Claude 3.5 Sonnet completes 24% of office tasks end-to-end (34.4% with partial credit); agents resort to deceptive shortcuts (renaming users to match the target) when stuck.
- IBM: ad-hoc approaches 19% → 6%; ~25% of "AI-first" orgs report transformative results. These stats are motivating for the canvas framework but not architectural claims on their own.
Systems introduced / referenced¶
- systems/mongodb-atlas — canonical unified-data-platform instance named in the post; extends to add the agent-memory-store role alongside its existing app-DB + vector-store + bulk-write roles.
- systems/voyage-ai — MongoDB-recommended domain-specialised embedding provider for technical documentation.
Concepts introduced / reinforced¶
- concepts/three-database-problem — the named infrastructure-chaos anti-pattern: juggling operational DB + vector DB + memory store with mismatched APIs / scaling characteristics.
- concepts/agent-memory-store — the agent-specific storage class for conversation history, context, and learned-behaviour state.
- concepts/vector-similarity-search — extended: the post positions vector search as one of three co-equal agent data-access surfaces, not a specialty workload.
- concepts/context-engineering — adjacent wiki concept reinforced: memory-store-as-first-class is the storage-layer counterpart to context-engineering's prompt-layer budget.
- concepts/training-serving-boundary — extended: in the agentic-AI shape this boundary collapses entirely for most teams (all external, no own training / serving).
Patterns introduced¶
- patterns/unified-data-platform-for-ai-agents — run application data + vectors + agent memory on a single substrate (document DB + native vector search + memory collections) rather than three separate systems.
- patterns/product-agent-data-model-design-flow — prescribed ordering for agent projects: product-need → agent capabilities → data requirements → model selection. Inserting "agent" between product and data is the canvas framework's load-bearing claim.
- patterns/prototype-before-production — adjacent pattern reinforced: the POC canvas is an explicit "validate the idea on paper before writing code" artefact in the prototype-before-production family.
Operational numbers¶
None — the post is a design-framework essay, not a production case study. All numbers cited are secondary-source industry statistics (McKinsey, MIT, IBM, SailPoint, PagerDuty, CMU) about the state of enterprise AI adoption in 2025, not Atlas / MongoDB production data.
Caveats¶
- Vendor-authored, product-promotional. Post is a MongoDB Blog Technical piece closing with three CTAs: download the canvas template, try Atlas, explore the AI Learning Hub. The canvas framework + unified-data-platform prescription both terminate at Atlas.
- Canvas squares not enumerated in the raw body. Post describes the canvases at phase-level (four phases for POC, five for Production) but the 8 / 11 individual squares are not listed — they live in the downloadable template, not in the post. Wiki pages reflect what the raw substantiates.
- No production implementation details. No customer case study, no Atlas configuration, no embedding-model benchmark, no memory-store schema, no latency numbers, no cost comparison vs. three-database architectures. Architecture discussion is prescriptive, not descriptive.
- Agent-memory-store design left under-specified. Post names the category but doesn't prescribe schema, eviction policy, retrieval API, or consistency model. The concept page captures the framing with explicit open-question tags.
- Swyx "Rise of the AI Engineer" link is the design-flow precedent. MongoDB explicitly extends Swyx's 2023 framing (product → data → model) to the agentic case (product → agent → data → model); readers can chase the original framing at latent.space/p/ai-engineer.
Source¶
- Original: https://www.mongodb.com/company/blog/technical/build-ai-agents-worth-keeping-canvas-framework
- Raw markdown:
raw/mongodb/2025-09-23-build-ai-agents-worth-keeping-the-canvas-framework-306bc028.md
Related¶
- companies/mongodb
- systems/mongodb-atlas — named unified-platform instance.
- systems/voyage-ai — named domain-specialised embedding provider.
- concepts/three-database-problem — the diagnosis.
- concepts/agent-memory-store — the under-named storage class.
- patterns/unified-data-platform-for-ai-agents — the prescription.
- patterns/product-agent-data-model-design-flow — the canvas framework's ordering principle.
- concepts/context-engineering — the adjacent prompt-layer discipline; unified-data-platform is its storage-layer counterpart.
- concepts/vector-similarity-search — one of three co-equal agent data-access surfaces.