SYSTEM Cited by 1 source
Storex (Databricks internal AI debugging platform)¶
Storex is Databricks' internal AI-agent platform for database fleet investigation and debugging. It unifies metrics, logs, dashboards, and CLI-equivalent tooling behind one conversational interface, retrieves signals across the fleet, correlates them, and guides engineers to a next mitigation step. Storex covers thousands of database instances across every major cloud, hundreds of regions, and eight regulatory domains.
Scale¶
- Thousands of database instances.
- Every major cloud (3 cloud providers).
- Hundreds of regions.
- Eight regulatory domains.
- Up to 90% claimed reduction in engineer investigation time.
- <5 minutes for a zero-context new hire to jump-start a DB investigation.
Architecture¶
Foundation layer — central-first sharded:
- Global Storex coordinator presents one API to engineers and agents.
- Regional shards hold sensitive/regulated data locally.
- Integrated with existing infra services (metrics, logs, dashboards, cloud APIs) so the agent sees consistent abstractions across clouds.
- Fine-grained access control enforced at team, resource, and RPC levels — one permission model for both humans and agents.
- See concepts/central-first-sharded-architecture.
Agent framework layer — DsPy + MLflow-inspired: - Tools defined as normal Scala classes + function signatures + a short docstring. - LLM reads docstring to infer input schema, output shape, result interpretation. - Prompts are decoupled from tool implementation — swap either independently. - Framework owns LLM connection, parsing, conversation state — not reinvented per tool. - See patterns/tool-decoupled-agent-framework and systems/dspy.
Validation harness: - Capture production-state snapshots (inputs, tool responses, final state). - Replay them through candidate agent configs (prompt variations, tool swaps). - A separate judge LLM scores responses on accuracy + helpfulness. - Referenced against Databricks MLflow 3 LLM-judges primitive. - See patterns/snapshot-replay-agent-evaluation and concepts/llm-as-judge.
Multi-agent composition: - Specialized agents per domain (system/DB issues, client-side traffic patterns, …). - Compose on a single root-cause analysis. - Extensible beyond databases as infra teams adopt the framework. - See patterns/specialized-agent-decomposition.
What it replaced¶
Before Storex, a MySQL incident investigation meant jumping between:
- systems/grafana for metrics.
- Internal Databricks dashboards for client workload shape.
- CLI SHOW ENGINE INNODB STATUS for transaction / I/O / deadlock detail.
- Cloud-console log-in to download slow-query logs.
Each tool worked; none of them composed. A seasoned MySQL engineer could stitch a hypothesis out of the four; a new engineer often couldn't start. Storex is the unification layer above these, plus an agent that knows how to walk between them.
Evolution (per the article)¶
- Hackathon (2 days) — prototype that unifies a few core DB metrics + dashboards into one view. Unpolished but immediately improves basic investigation. See patterns/hackathon-to-platform.
- v1 static SOP agent — codified the debugging runbook as a deterministic workflow. Engineers rejected it: they wanted a diagnostic report, not an automated checklist.
- v2 anomaly detection — surfaced right signals, still no clear next step.
- v3 chat assistant (breakthrough) — codifies debugging knowledge, handles follow-ups, makes investigation interactive. This is the shape of production Storex.
Deferred / future work¶
- Mutating ops (restores, production queries, config updates) are named as future work. Storex at article time is read-heavy (diagnostic retrieval + reasoning); write-path safety story isn't described yet.
Seen in¶
- sources/2025-12-03-databricks-ai-agent-debug-databases — the origin article; covers the tooling-fragmentation diagnosis, the central-first foundation, the DsPy-inspired tool/prompt decoupling, the snapshot-replay + judge-LLM validation, and the multi-agent extensibility argument.
Related¶
- systems/dspy — prompt/tool decoupling inspiration.
- systems/mlflow — prompt-optimization + LLM-judge primitives.
- concepts/llm-as-judge
- concepts/central-first-sharded-architecture
- patterns/tool-decoupled-agent-framework
- patterns/snapshot-replay-agent-evaluation
- patterns/specialized-agent-decomposition
- patterns/hackathon-to-platform
- concepts/observability