Skip to content

SYSTEM Cited by 1 source

Storex (Databricks internal AI debugging platform)

Storex is Databricks' internal AI-agent platform for database fleet investigation and debugging. It unifies metrics, logs, dashboards, and CLI-equivalent tooling behind one conversational interface, retrieves signals across the fleet, correlates them, and guides engineers to a next mitigation step. Storex covers thousands of database instances across every major cloud, hundreds of regions, and eight regulatory domains.

Scale

  • Thousands of database instances.
  • Every major cloud (3 cloud providers).
  • Hundreds of regions.
  • Eight regulatory domains.
  • Up to 90% claimed reduction in engineer investigation time.
  • <5 minutes for a zero-context new hire to jump-start a DB investigation.

Architecture

Foundation layercentral-first sharded: - Global Storex coordinator presents one API to engineers and agents. - Regional shards hold sensitive/regulated data locally. - Integrated with existing infra services (metrics, logs, dashboards, cloud APIs) so the agent sees consistent abstractions across clouds. - Fine-grained access control enforced at team, resource, and RPC levels — one permission model for both humans and agents. - See concepts/central-first-sharded-architecture.

Agent framework layer — DsPy + MLflow-inspired: - Tools defined as normal Scala classes + function signatures + a short docstring. - LLM reads docstring to infer input schema, output shape, result interpretation. - Prompts are decoupled from tool implementation — swap either independently. - Framework owns LLM connection, parsing, conversation state — not reinvented per tool. - See patterns/tool-decoupled-agent-framework and systems/dspy.

Validation harness: - Capture production-state snapshots (inputs, tool responses, final state). - Replay them through candidate agent configs (prompt variations, tool swaps). - A separate judge LLM scores responses on accuracy + helpfulness. - Referenced against Databricks MLflow 3 LLM-judges primitive. - See patterns/snapshot-replay-agent-evaluation and concepts/llm-as-judge.

Multi-agent composition: - Specialized agents per domain (system/DB issues, client-side traffic patterns, …). - Compose on a single root-cause analysis. - Extensible beyond databases as infra teams adopt the framework. - See patterns/specialized-agent-decomposition.

What it replaced

Before Storex, a MySQL incident investigation meant jumping between: - systems/grafana for metrics. - Internal Databricks dashboards for client workload shape. - CLI SHOW ENGINE INNODB STATUS for transaction / I/O / deadlock detail. - Cloud-console log-in to download slow-query logs.

Each tool worked; none of them composed. A seasoned MySQL engineer could stitch a hypothesis out of the four; a new engineer often couldn't start. Storex is the unification layer above these, plus an agent that knows how to walk between them.

Evolution (per the article)

  1. Hackathon (2 days) — prototype that unifies a few core DB metrics + dashboards into one view. Unpolished but immediately improves basic investigation. See patterns/hackathon-to-platform.
  2. v1 static SOP agent — codified the debugging runbook as a deterministic workflow. Engineers rejected it: they wanted a diagnostic report, not an automated checklist.
  3. v2 anomaly detection — surfaced right signals, still no clear next step.
  4. v3 chat assistant (breakthrough) — codifies debugging knowledge, handles follow-ups, makes investigation interactive. This is the shape of production Storex.

Deferred / future work

  • Mutating ops (restores, production queries, config updates) are named as future work. Storex at article time is read-heavy (diagnostic retrieval + reasoning); write-path safety story isn't described yet.

Seen in

  • sources/2025-12-03-databricks-ai-agent-debug-databases — the origin article; covers the tooling-fragmentation diagnosis, the central-first foundation, the DsPy-inspired tool/prompt decoupling, the snapshot-replay + judge-LLM validation, and the multi-agent extensibility argument.
Last updated · 200 distilled / 1,178 read