SYSTEM Cited by 1 source
MLflow OTel Tracing¶
MLflow OTel Tracing is the agent-instrumentation surface within MLflow (3.x) that emits OpenTelemetry-format traces and routes them — via Zerobus Ingest — into UC OTel Trace Tables in the customer's Unity Catalog. It is the framework-side companion to the lakehouse-resident storage layer; together they implement the "observability for any agent, anywhere" promise of the 2026-05-22 launch.
Two instrumentation modes¶
From the source:
"You can do automatic and/or manual tracing… In our example, we rely on
mlflow.langchain.autolog()to capture the detailed LangGraph execution (model calls and tool calls). We also wrap the entrypoint with@MLflow.traceto establish a request-level root span, allowing each invocation to be observed as a single end-to-end execution." — Source: sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog
| Mode | Mechanism | What it captures |
|---|---|---|
| Automatic | Library-specific autolog (mlflow.langchain.autolog(), etc.) |
Detailed per-call execution: model calls, tool calls, retrieval, intermediate steps |
| Manual | @MLflow.trace decorator on the entrypoint function |
Request-level root span; binds the whole invocation as a single end-to-end execution |
The composition (autolog + manual root) is the recommended shape: autolog gives you the inner spans for free; the manual root gives every trace a stable, queryable boundary.
Trace-table provisioning¶
MLflow is also the table-creation surface for the schema in systems/uc-otel-trace-tables:
"In this example, we use MLflow to create the underlying OpenTelemetry tables in Unity Catalog and link them to an MLflow experiment so traces can be searched, analyzed, and annotated from the UI. Start by identifying (or creating) a SQL warehouse and an MLflow experiment, then use the MLflow Python library to provision the Unity Catalog tables and associate the schema with the experiment."
Setup chain:
- Identify or create a SQL warehouse.
- Identify or create an MLflow experiment.
- Use the MLflow Python library to provision the six UC tables/views.
- Associate the schema with the experiment.
- Point any OTLP client at the resulting endpoint via Zerobus REST or gRPC.
After this one-time setup, "agent instrumentation remains the same. Any OTel-compatible instrumentation library can export traces to the configured endpoint."
Where it sits in the stack¶
agent code (LangGraph / OpenAI SDK / Anthropic SDK / framework-agnostic)
│
│ @MLflow.trace decorator (manual root span)
│ mlflow.<library>.autolog() (automatic per-call spans)
│
▼
OTel SDK (per-language)
│
│ OTLP/gRPC or REST
│
▼
Zerobus Ingest (managed)
│
▼
UC OTel Trace Tables (Delta-backed)
│
▼
MLflow Experiment UI ── search / drill / annotate / judge-score
SQL / Genie / dashboards / ETL ── the broader lakehouse consumer set
Decoupling property: agents can run anywhere¶
The structural payoff:
"the agent can be running anywhere. In fact the support assistant agent example that was used for this blog is deployed locally." (FAQ)
The instrumentation library + OTel + Zerobus's REST endpoint together constitute a portable observability boundary — agents in customer VPCs, on developer laptops, in third-party clouds, or inside Databricks Apps all emit to the same UC tables. The agent runtime is not coupled to Databricks. This is the canonical instance of concepts/instrumentation-storage-decoupling applied to MLflow.
Closing the loop: production traces → evaluation¶
MLflow OTel Tracing is also the substrate for MLflow's evaluation flow:
"MLflow allows us to run evaluations against an evaluation dataset, applying built-in or custom judges to score response quality. One effective approach is to bootstrap this dataset from real traces. Because these prompts originate from actual user interactions, they better represent the scenarios your agent must handle compared to purely synthetic test cases."
"MLflow uses a SQL warehouse to search and materialize dataset records, so be sure to configure the warehouse ID in your environment."
And in production:
"MLflow can automatically evaluate live traces using the same judges, helping us quickly detect regressions, drift, and emerging failure patterns. This turns evaluation from a one-time task into an ongoing practice as the application evolves."
Canonical instances of patterns/bootstrap-eval-dataset-from-production-traces and concepts/production-traces-as-evaluation-substrate.
Reference instrumentation (article's example)¶
The 2026-05-22 post's reference agent — "Support Manager Assistant":
- Framework: LangGraph (deployed locally, outside Databricks).
- Model: Databricks-hosted Claude Sonnet 4.6 (via Foundation Model APIs).
- Tool: Genie Space over the MCP tool API for SQL-driven Q&A.
- Instrumentation:
mlflow.langchain.autolog()+@MLflow.traceon the entrypoint. - Sample query: "Which support engineer should I put up for promotion?" — agent makes 3 Genie tool calls + final summarisation; trace surfaces 3 tool spans + 1 root span + LLM-call spans.
Native dashboards (MLflow Experiment UI)¶
"The MLflow Experiment UI now ships with native observability dashboards for traces in Unity Catalog, including views for trace volume, errors, latency, token usage, and cost. For most teams, that's enough to monitor day-to-day agent health."
Five default dashboard views:
| View | Native granularity |
|---|---|
| Trace volume | Per experiment / time window |
| Errors | Per error type / time |
| Latency | Trace-level P50 / P99 (extend to span-level via patterns/component-level-latency-from-otel-spans) |
| Token usage | Per model / time |
| Cost | List-price; extend with custom SQL for contract pricing |
When the native views aren't enough, "the trace tables are still just Delta tables" — custom AI/BI dashboards on top of the same UC tables are the escape hatch.
Caveats¶
- Tied to the MLflow experiment model. Customers who don't want experiment-scoped trace organisation must work around it; the post focuses on experiment-attached flows.
- Autolog quality is library-specific.
mlflow.langchain.autolog()is mature; coverage for less-common frameworks is unstated. - Manual root-span hygiene is on the developer. Without
@MLflow.traceon the entrypoint, traces fragment — autolog-only setups will surface inner spans without a clean trace boundary. - Judge integration assumes high-quality LLM judges. The post names "built-in or custom guidelines" but does not benchmark judge accuracy. Compare with the 2026-05-13 Claroty CSAF ingest's deliberately-conservative pass/fail/unknown ternary.
- No throughput / latency SLO for the instrumentation path itself. Agent-side overhead of
@MLflow.trace+ autolog is not characterised. - Setup requires SQL warehouse provisioning in addition to an experiment — non-trivial for teams new to Databricks.
Seen in¶
- sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog — first-class disclosure: the dual
autolog+@MLflow.tracepattern, table-provisioning-via-MLflow, the "agent runs anywhere" property, the prod-traces-as-eval-substrate flow, and the native experiment-UI dashboards are all in this source.
Related¶
- systems/mlflow — parent platform.
- systems/zerobus-ingest — managed receiver downstream.
- systems/uc-otel-trace-tables — storage schema this system writes into.
- systems/opentelemetry — wire protocol.
- systems/unity-catalog — governance + storage substrate.
- systems/langgraph — instrumentation companion (article's reference agent).
- systems/databricks-genie — tool surface invoked in the reference agent.
- concepts/instrumentation-storage-decoupling — structural property.
- concepts/production-traces-as-evaluation-substrate — the eval-loop substrate.
- concepts/llm-as-judge — scoring primitive used in evaluation.
- patterns/managed-otel-ingestion-direct-to-lakehouse — sink-side pattern.
- patterns/bootstrap-eval-dataset-from-production-traces — dataset-provisioning pattern.
- patterns/component-level-latency-from-otel-spans — dashboard pattern over the spans this system emits.
- companies/databricks