SYSTEM Cited by 1 source

Storex (Databricks internal AI debugging platform)¶

Storex is Databricks' internal AI-agent platform for database fleet investigation and debugging. It unifies metrics, logs, dashboards, and CLI-equivalent tooling behind one conversational interface, retrieves signals across the fleet, correlates them, and guides engineers to a next mitigation step. Storex covers thousands of database instances across every major cloud, hundreds of regions, and eight regulatory domains.

Scale¶

Thousands of database instances.
Every major cloud (3 cloud providers).
Hundreds of regions.
Eight regulatory domains.
Up to 90% claimed reduction in engineer investigation time.
<5 minutes for a zero-context new hire to jump-start a DB investigation.

Architecture¶

Foundation layer — central-first sharded: - Global Storex coordinator presents one API to engineers and agents. - Regional shards hold sensitive/regulated data locally. - Integrated with existing infra services (metrics, logs, dashboards, cloud APIs) so the agent sees consistent abstractions across clouds. - Fine-grained access control enforced at team, resource, and RPC levels — one permission model for both humans and agents. - See concepts/central-first-sharded-architecture.

Agent framework layer — DsPy + MLflow-inspired: - Tools defined as normal Scala classes + function signatures + a short docstring. - LLM reads docstring to infer input schema, output shape, result interpretation. - Prompts are decoupled from tool implementation — swap either independently. - Framework owns LLM connection, parsing, conversation state — not reinvented per tool. - See patterns/tool-decoupled-agent-framework and systems/dspy.

Validation harness: - Capture production-state snapshots (inputs, tool responses, final state). - Replay them through candidate agent configs (prompt variations, tool swaps). - A separate judge LLM scores responses on accuracy + helpfulness. - Referenced against Databricks MLflow 3 LLM-judges primitive. - See patterns/snapshot-replay-agent-evaluation and concepts/llm-as-judge.

Multi-agent composition: - Specialized agents per domain (system/DB issues, client-side traffic patterns, …). - Compose on a single root-cause analysis. - Extensible beyond databases as infra teams adopt the framework. - See patterns/specialized-agent-decomposition.

What it replaced¶

Before Storex, a MySQL incident investigation meant jumping between: - systems/grafana for metrics. - Internal Databricks dashboards for client workload shape. - CLI SHOW ENGINE INNODB STATUS for transaction / I/O / deadlock detail. - Cloud-console log-in to download slow-query logs.

Each tool worked; none of them composed. A seasoned MySQL engineer could stitch a hypothesis out of the four; a new engineer often couldn't start. Storex is the unification layer above these, plus an agent that knows how to walk between them.

Evolution (per the article)¶

Hackathon (2 days) — prototype that unifies a few core DB metrics + dashboards into one view. Unpolished but immediately improves basic investigation. See patterns/hackathon-to-platform.
v1 static SOP agent — codified the debugging runbook as a deterministic workflow. Engineers rejected it: they wanted a diagnostic report, not an automated checklist.
v2 anomaly detection — surfaced right signals, still no clear next step.
v3 chat assistant (breakthrough) — codifies debugging knowledge, handles follow-ups, makes investigation interactive. This is the shape of production Storex.

Deferred / future work¶

Mutating ops (restores, production queries, config updates) are named as future work. Storex at article time is read-heavy (diagnostic retrieval + reasoning); write-path safety story isn't described yet.

Seen in¶

sources/2025-12-03-databricks-ai-agent-debug-databases — the origin article; covers the tooling-fragmentation diagnosis, the central-first foundation, the DsPy-inspired tool/prompt decoupling, the snapshot-replay + judge-LLM validation, and the multi-agent extensibility argument.

systems/dspy — prompt/tool decoupling inspiration.
systems/mlflow — prompt-optimization + LLM-judge primitives.
concepts/llm-as-judge
concepts/central-first-sharded-architecture
patterns/tool-decoupled-agent-framework
patterns/snapshot-replay-agent-evaluation
patterns/specialized-agent-decomposition
patterns/hackathon-to-platform
concepts/observability