Databricks — How World Bank Group uses Databricks to eradicate poverty through shared knowledge¶

Databricks Blog customer-success post (2026-05-22) on the World Bank Group's unified data + AI knowledge platform built on Databricks. Tier-3 vendor-blog source. Borderline-include scope decision: typical Databricks customer-success bookend framing (~50%) and Suresh-Kaudi-quoted workflow narration, but the agentic-layer-on-top-of-multiple-Genies architectural composition (~30% of body) names a specific three-classifier router shape (intent classifier → domain classifier → query decomposer) fronting multiple per-domain Genie instances, plus a metrics layer retrofit when Genie's default LLM-only structured-data answers were nondeterministic ("In the structured content, you need an answer. What is my bank balance? I don't want to see a different number every time."), plus a decoupled visualisation agent that re-renders chart types without re-querying. Real production scale disclosed: 3 million document downloads/month through the AI-powered search-and-synthesis layer, half from low- and middle-income countries. No latency / concurrency / per-Genie cost numbers. No code. No diagrams.

One-paragraph summary¶

The World Bank Group operates structured operational data (legacy on-premises databases) and unstructured document repositories ("tens of millions of documents", 3 million publication downloads / month) that have never been integrated. Pre-Databricks, "librarians, researchers would go in and pull tons and tons of documents, try to read through them, try to make sense out of it" to answer basic project-history questions. The team migrated operational data into Databricks behind Unity Catalog for governance, indexed unstructured documents into Volumes + Vector Search for RAG, and exposed structured-data questions through Genie. Two practical problems then surfaced and drove the architecture's distinguishing shape: (1) Genie's nondeterministic structured-data answers, fixed by a metrics layer ("What is my bank balance? I don't want to see a different number every time") — see patterns/metrics-layer-for-deterministic-genie-answers; (2) Cross-domain queries broke single-Genie, because each Genie instance is built against a specific metrics layer / data domain, so a question spanning two domains ("what is my commitment in India and what are my actions") required querying two separate Genies. The team's response was to put an agentic layer on top with three named classifiers — intent classifier (what's being asked), domain classifier (which Genie / agent to invoke), query decomposer (split multi-part questions into per-domain sub-queries) — plus fan-out routing to per-domain Genies, a RAG agent for document retrieval, and a visualisation agent that handles chart-type changes ("a bar chart and the user wants a pie chart instead, the visualization agent handles that without re-running the underlying query") without re-querying. Results are assembled and returned as a single response. The post likens the shape to traditional multi-tier web design (front end / application / business logic / database) updated for an AI context. Operational disclosures: 3 million document downloads / month through the AI-powered layer, half from low- and middle-income countries; user-feedback prototype spanning Africa + East Asia Pacific built and deployed in ~2.5 days; corporate scorecard delivered on the platform ("more outcomes-driven than output-driven... how many jobs we gave, how much connectivity was established"); long-term vision is the Knowledge 360 + Data 360 flagship projects unifying World Bank Group, IFC, IDA, and MIGA so knowledge is accessible regardless of which institution generated it.

Key takeaways¶

Tens-of-millions-of-documents-plus-legacy-OLTP is the pre-state to a unified data+AI platform — World Bank Group started with structured operational data on legacy on-prem databases ("difficult to keep pace with evolving reporting requirements") plus unstructured document repositories ("tens of millions of documents... 3 million publication downloads / month") that had never been integrated. The manual-librarian-search workflow ("Librarians, researchers would go in and pull tons and tons of documents") is the canonical pre-state for the RAG migration shape — see also the MapAid groundwater-archives ingest for a sibling-shape pre-state. (Source: this post.)
Unity Catalog as the unification primitive — Kaudi's characterisation: "Unity Catalog was a game changer for us. It was a single unified interface where we could govern our data." This is at name-drop altitude, but the substantive claim is that UC is what made structured + unstructured coexistence operationally tractable. UC + Volumes (the "scalable path for managing unstructured document content alongside structured data in the same platform") is the platform-substrate move; everything downstream (Genie, RAG, agentic router) sits on top of it. (Source: this post.)
Genie's default structured-data output is nondeterministic enough to be unfit for financial/operational reporting — the load-bearing failure mode that drove the metrics-layer retrofit: "When early Genie deployments returned inconsistent results for structured queries, the team implemented a metrics layer to ensure they got deterministic answers, critical for financial and operational reporting." Followed by Kaudi's verbatim diagnosis: "In the structured content, you need an answer. What is my bank balance? I don't want to see a different number every time." This is the canonical wiki disclosure of the metrics-layer-for-deterministic-Genie-answers pattern: the metrics layer pins business-measure semantics to a schema so that the same query produces the same number — independent of LLM trajectory variance. Composes with the prior wiki finding from Trinity Industries (Genie's effectiveness depended on the prior measure-consolidation work, 600 measure variants → one canonical layer). The two cases are saying different but complementary things: Trinity says "without measure consolidation Genie cannot answer correctly at all"; World Bank says "even after metric semantics are clean, the LLM-only output path is still nondeterministic — pin a metrics layer to the SQL path to fix it." (Source: this post; see also concepts/source-of-truth-disambiguation, concepts/measure-proliferation, patterns/transform-upstream-to-collapse-measures.)
Each Genie is per-metrics-layer / per-domain, so cross-domain queries break single-Genie — "Each Genie instance is built against a specific metrics layer, meaning a separate Genie is needed for each data domain. A question that spans two domains, for example 'what is my commitment in India and what are my actions,' would require querying two separate Genies." This is the architectural forcing function for the agentic-router layer: single-Genie cannot serve cross-domain questions; you need a routing tier above. (Source: this post.)
The agentic-router shape: intent → domain → decomposer → fan-out → assemble — "The solution was an agentic layer on top. The World Bank Group built a single interface backed by an intent classifier, a domain classifier and a query decomposer. When a question comes in, the intent classifier identifies what's being asked, the domain classifier determines which agent or agents need to be called, and the query decomposer breaks complex multi-part questions into components and routes each to the right place. Results are assembled and returned as a single response." This canonicalises as patterns/intent-domain-decomposer-agentic-router — distinguished from the prior wiki pattern patterns/multi-agent-supervisor-routing (Virtue Foundation's VF Agent: single supervisor → vector-search OR Genie, alternative-selection only) by adding query decomposition for cross-domain fan-out and separating intent classification from domain selection. Fan-out + result assembly is what makes the "what is my commitment in India and what are my actions" case work. (Source: this post.)
Visualisation agent decoupled from query agents — "If a query returns data as a bar chart and the user wants a pie chart instead, the visualization agent handles that without re-running the underlying query." This is a small but architecturally significant disclosure: the chart-rendering layer is independent of the query/retrieval layer, so re-renders don't pay the query cost a second time. Familiar pattern from BI tooling but worth canonicalising as a property of the agentic shape — the "visualisation agent" is named as a peer of the per-domain Genie agents and the RAG agent rather than buried inside any one of them. (Source: this post.)
Multi-tier-web-design analogy is the explicit framing — "It's not unlike traditional multi-tier web design, with front end, application layer, business logic and database, updated for an AI context. The user sees one interface, but behind it, any number of domain-specific Genie agents can be running, alongside the RAG agent for document retrieval and a visualization agent that controls how results are displayed." This is the post's self-description of the agentic-router shape and is genuinely useful framing — "agentic layer = the application/business-logic tier in a traditional N-tier web architecture" — for explaining the pattern to engineers familiar with web N-tier but not with multi-agent shapes. (Source: this post.)
Databricks AI Gateway named as the centralized control plane — "The Databricks AI Gateway provided centralized control over agent access, cost management and security as the system grew more complex." This is at name-drop altitude with no concrete disclosure (no per-call latency, no policy mechanism, no identity-flow detail) but is the first wiki source where the Unity AI Gateway is named as the gating substrate for an agentic-router-fronted multi-Genie deployment rather than a pure coding-agent deployment. Confirms the 2026-05-20 Governing AI agents at scale with Unity Catalog post's scope-generalisation thesis (every department's agents, not just coding agents). (Source: this post; see sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog for the architectural details the World Bank post does not disclose.)
External user-feedback collection prototype: 2.5 days end-to-end — "Before expanding the system broadly, the team ran structured feedback sessions with external stakeholders including NGOs, civil servants and government representatives across Africa and East Asia Pacific regions. They used AI/BI to capture query inputs, routing decisions and outputs, then analyzed results to understand what questions users were actually asking and where gaps existed." The prototype was "built and deployed in approximately two and a half days" — Kaudi's verbatim contrast: "Two years back I would have imagined doing it in a two-year span." This is workflow rhetoric (no per-feature breakdown of the 2.5 days), but the logging-routing-decisions-and-outputs for downstream gap analysis primitive is a real architectural choice — the agentic router's intent + domain classifications are themselves observability data, used to discover what user-question shapes the system's domain decomposition doesn't yet cover. (Source: this post.)
Operational scale: 3M document downloads / month, half from low- and middle-income countries; corporate scorecard delivered on platform — the only quantified production-scale anchor in the post. Kaudi's characterisation of the scorecard's design bias: "It's more outcomes-driven than output-driven. Instead of saying how many miles of road we put in, it started measuring how many jobs we gave, how much connectivity was established." The 50% LMIC traffic share is a useful equity-of-access disclosure but not an engineering metric. Knowledge 360 + Data 360 are named as the long-term flagship initiatives that will unify World Bank Group + IFC (International Finance Corporation) + IDA (International Development Association) + MIGA (Multilateral Investment Guarantee Agency) so knowledge is "accessible to any stakeholder regardless of which institution generated it." (Source: this post.)

Architecture: agentic router fronting multiple per-domain Genies + RAG + visualisation¶

Composing the verbatim disclosures:

        User free-text question
                  │
                  ▼
        ┌─────────────────────┐
        │ Intent Classifier   │  "what's being asked"
        └─────────────────────┘
                  │
                  ▼
        ┌─────────────────────┐
        │ Domain Classifier   │  "which agent(s) to invoke"
        └─────────────────────┘
                  │
                  ▼
        ┌─────────────────────┐
        │ Query Decomposer    │  cross-domain → per-domain sub-queries
        └─────────────────────┘
            │      │      │
            ▼      ▼      ▼
   ┌────────────┐ ┌─────────┐ ┌────────────┐
   │ Genie A    │ │ Genie B │ │ RAG Agent  │
   │ (Domain A  │ │ (Domain │ │ (Vector    │
   │  metrics   │ │  B met- │ │  Search    │
   │  layer)    │ │  rics)  │ │  over UC   │
   │            │ │         │ │  Volumes)  │
   └────────────┘ └─────────┘ └────────────┘
            │      │      │
            ▼      ▼      ▼
        ┌─────────────────────┐
        │ Result Assembly     │
        └─────────────────────┘
                  │
                  ▼
        ┌─────────────────────┐
        │ Visualisation Agent │  chart-type changes
        │                     │  without re-querying
        └─────────────────────┘
                  │
                  ▼
              Response

Substrate (named in the post):

Per-domain Genie agents — each Genie instance is built against a specific metrics layer; that pinning is the source of deterministic structured-data answers and the reason a separate Genie exists per data domain.
RAG agent over UC Volumes + Vector Search — "they indexed project documents to create a retrieval-augmented generation capability that could respond to natural language queries and thus save manual search." Indexed corpus is the scanned project-documents archive that pre-Databricks required manual librarian search.
Databricks AI Gateway as gating control plane — "centralized control over agent access, cost management and security as the system grew more complex." Confirms the 2026-05-20-disclosed scope-generalisation: AI Gateway gates non-coding-agent populations too.
Unity Catalog as the governance primitive — "a single unified interface where we could govern our data"; substrate for both the structured-data tables that feed each Genie and the unstructured documents indexed into the RAG corpus.
Visualisation agent — peer of the query agents in the routing graph, not buried inside one. Re-renders happen at the visualisation tier, not the query tier.
AI/BI as observability harness for the agentic router — "used AI/BI to capture query inputs, routing decisions and outputs" — the router's own classifications are logged for feedback analysis.

What's not disclosed¶

Per-Genie / per-agent latency — no p50 / p99 query latency for Genie or RAG; no end-to-end-question-to-response budget; no visualisation-agent re-render latency.
Concurrency — no per-tier QPS, no agentic-router throughput.
Per-call cost — no per-LLM-call cost, no metrics-layer evaluation cost, no fan-out aggregate cost for cross-domain questions.
Number of distinct per-domain Genies in production — the text implies "any number of domain-specific Genie agents" but doesn't quantify; no list of which World-Bank-data domains have dedicated Genies.
Intent / domain / decomposer classifier mechanics — no model choice (frontier-LLM vs small-model classifier vs trained classifier), no prompt template, no domain-taxonomy schema, no decomposition-strategy detail (rule-based vs LLM-based).
Result-assembly mechanism — "results are assembled and returned as a single response" — but how. Multi-Genie answers with conflicting numbers? Failure handling when one sub-query fails? Partial-response semantics? None disclosed.
Metrics-layer mechanism — what is a "metrics layer" concretely? UC governed-tags + UDF? A metric-view layer? A custom registry? The post names the layer's purpose but not its implementation.
Visualisation-agent mechanism — chart-render-without-query is the property; how the chart-state-vs-query-state separation is implemented (cached query result + render-on-demand, vs separate visualisation-tier service, vs other) is not disclosed.
Document-indexing pipeline shape — Volumes + Vector Search named, but no embedding model, no chunking strategy, no re-indexing cadence, no per-document metadata schema.
AI Gateway integration shape — named as the gating layer; no policy mechanism, no per-agent identity flow, no fail-closed vs fail-open posture disclosed for this deployment.
2.5-day prototype breakdown — no per-feature time budget, no team size, no scope (which agentic-router pieces existed before the 2.5-day window).
Knowledge-360 / Data-360 timeline + scope — named as the long-term flagship; no current state, no per-institution integration progress.

Caveats¶

Tier-3 customer-success co-marketing post — the bookend framing is heavy customer-impact rhetoric (poverty eradication, shared prosperity); the architectural disclosure is in the middle ~30% of body. AGENTS.md borderline-include applies.
Mechanism-light throughout — every named primitive (intent classifier, domain classifier, query decomposer, metrics layer, visualisation agent, AI Gateway) is at name-drop or capability-prose altitude. No code, no diagrams, no per-component latency, no production incidents.
Single-customer disclosure — the agentic-router shape is attributed to World Bank Group's specific build; whether other Databricks customers building multi-Genie deployments converge on the same intent / domain / decomposer split is not disclosed.
Quotes are workflow-rhetoric — Suresh Kaudi's load-bearing quotes ("Unity Catalog was a game changer", "What is my bank balance?", "how do I go and look for a project that was executed in India in 1960?") are useful for framing but not engineering metrics.
Anonymous Databricks-marketing byline + customer-co-marketing framing — characteristic of the Databricks-Engineering customer-success-story genre. The substantive architectural primitives are real and worth canonicalising; the surrounding rhetoric is not.

Source¶

systems/databricks-genie — the per-domain query substrate; this post adds the per-Genie metrics-layer / per-domain pinning disclosure and the multi-Genie-fronted-by-agentic-router composition shape to the wiki's Genie page.
systems/unity-catalog — the governance + structured-data substrate; named as "a single unified interface where we could govern our data".
systems/unity-catalog-volumes — the substrate for the unstructured-document RAG corpus.
systems/mosaic-ai-vector-search — the indexed-document similarity-search layer of the RAG agent.
systems/unity-ai-gateway — the centralised control plane for the agentic deployment.
companies/world-bank-group — the named customer.
companies/databricks — the platform vendor.
patterns/intent-domain-decomposer-agentic-router — canonical wiki pattern for the three-classifier router shape.
patterns/metrics-layer-for-deterministic-genie-answers — the deterministic-structured-data-answer retrofit.
patterns/multi-agent-supervisor-routing — sibling pattern (single-supervisor alternative-selection routing); this post's shape is the fan-out-and-decompose generalisation.
concepts/multi-llm-sub-agent-routing — broader concept family.
concepts/source-of-truth-disambiguation — companion concept; metrics-layer is one mechanism for it on the structured-data path.
concepts/measure-proliferation — the related upstream forcing function from the Trinity Industries case.
patterns/transform-upstream-to-collapse-measures — Trinity's measure-consolidation move; complements (not duplicates) the metrics-layer move described here.