PATTERN Cited by 1 source

Component-level latency from OTel spans¶

Component-level latency from OTel spans is the pattern of computing per-span / per-tool / per-component latency percentiles (P50 / P99) directly over OTel-spans tables to attribute end-user latency to the specific component in the agent execution path that's slow — instead of stopping at the trace-level (whole-request) latency that native dashboards default to.

Mechanics¶

trace-level latency (native dashboards)  ── tells you "P99 is high"
        │
        │  drill in
        ▼
span-level latency (this pattern)        ── tells you "the retrieve_docs tool is the bottleneck"
        │
        │  custom SQL on <prefix>_otel_spans
        ▼
SELECT span_name,
       PERCENTILE(duration, 0.5) AS p50,
       PERCENTILE(duration, 0.99) AS p99,
       AVG(error) AS error_rate
FROM <prefix>_otel_spans
WHERE trace_time > <window>
GROUP BY span_name
ORDER BY p99 DESC

The structural insight: trace-level latency averages multiple components together, hiding which one is the cause. Per-span aggregation surfaces that.

Canonical instance: Databricks AI Operations Center (2026-05-22)¶

"Native latency views show P50/P99 at the trace level. To go a layer deeper and see which tool is slow, we built a Tool Performance widget that breaks down latency (P50, P99) and error rates per individual tool in the agent (for example, retrieve_docs vs. generate_response). That tells us whether the LLM, a Genie tool call, or another step is the bottleneck, so we can pinpoint exactly where the user experience is degrading."

— Source: sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog

The Databricks AI Operations Center custom-dashboard example pairs this pattern with two more on the same UC OTel substrate:

Custom Cost Analysis with Contract Pricing — token-usage by model + contract pricing → Estimated Cost per Trace (catches outliers like "a single complex query that costs $0.50 because of a retrieval loop").
Component-Level Performance — this pattern.

Both exploit the same property: "the trace tables are still just Delta tables in Unity Catalog. You can build a custom AI/BI Dashboard against them and write standard SQL (with help from AI) to model whatever your team cares about."

Why span-level matters¶

A multi-tool agent's trace decomposes into:

root span (request)
├── LLM call 1 (planner)              ── 500 ms
├── tool: retrieve_docs                ── 8000 ms  ◄── bottleneck
├── tool: generate_response (LLM 2)    ── 1200 ms
└── tool: format_output                ── 50 ms

Trace-level latency reports "P99 = ~9.8 s". That's correct but uninformative — it doesn't tell you the retrieval is the cause. Span-level reports retrieve_docs P99 = 8 s, immediately actionable.

When this pattern matters most¶

Multi-tool agents with many components per request — the more components, the more trace-level latency obscures which one matters.
Long-tail latency investigations — "why is P99 spiking?" is unanswerable at the trace level for any agent with ≥3 components.
Cost-attribution at the component level — the Databricks custom-pricing dashboard joins span identity (which tool / which model) with token usage to attribute cost per component.
Tool-replacement decisions — "if we swap retrieval implementations, what does P99 become?" requires component-level-attributed baseline.

Composition with other patterns¶

Substrate from patterns/managed-otel-ingestion-direct-to-lakehouse — span-level data must be queryable; lakehouse-resident OTel spans deliver this.
Sibling to patterns/bootstrap-eval-dataset-from-production-traces — same UC OTel tables; different consumer (latency dashboard vs eval dataset).
Generalises beyond AI agents — any service with OTel-instrumented sub-components (microservices, request handlers with multiple downstream calls, etc.) can use the same shape.

Caveats¶

Span-name discipline matters. If retrieve_docs is sometimes named retrieve and sometimes named docs_retrieval, aggregation fragments. Manual @MLflow.trace(name="retrieve_docs") decoration on entrypoints is a good practice.
Wall-clock vs CPU vs IO. Span duration is wall-clock; the bottleneck class (CPU-bound, IO-bound, network-bound) requires additional context.
Concurrency obscures causality. Parallel spans complicate the "which step is slow" question; the pattern works best when spans are mostly sequential.
Cardinality cost. For agents with many distinct tool names, the GROUP BY can be expensive on huge windows. Pre-aggregation (materialized views, hourly rollups) is the typical scale-out path.
Native dashboards may be enough. The Databricks post explicitly says "For most teams, that's enough to monitor day-to-day agent health." This pattern is for teams whose use cases exceed the native views.
Span-name leakage. Span names sometimes embed user-supplied identifiers (URLs, IDs); if so, cardinality balloons and aggregation breaks. Span-name normalisation discipline is a prerequisite.

Seen in¶

sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog — canonical instance: the Tool Performance widget in Databricks' AI Operations Center example, breaking down latency P50 / P99 + error rate per tool over <prefix>_otel_spans. Concrete examples named: retrieve_docs vs generate_response.

patterns/managed-otel-ingestion-direct-to-lakehouse — substrate pattern.
patterns/bootstrap-eval-dataset-from-production-traces — sibling consumer of the same UC tables.
patterns/telemetry-to-lakehouse — generalisation.
concepts/lakehouse-native-observability — storage-substrate concept.
concepts/observability — parent concept.
systems/mlflow — host platform.
systems/mlflow-otel-tracing — span-emission companion.
systems/uc-otel-trace-tables — span storage.
systems/zerobus-ingest — span ingest.
systems/opentelemetry — wire format.
companies/databricks