PATTERN Cited by 1 source
Component-level latency from OTel spans¶
Component-level latency from OTel spans is the pattern of computing per-span / per-tool / per-component latency percentiles (P50 / P99) directly over OTel-spans tables to attribute end-user latency to the specific component in the agent execution path that's slow — instead of stopping at the trace-level (whole-request) latency that native dashboards default to.
Mechanics¶
trace-level latency (native dashboards) ── tells you "P99 is high"
│
│ drill in
▼
span-level latency (this pattern) ── tells you "the retrieve_docs tool is the bottleneck"
│
│ custom SQL on <prefix>_otel_spans
▼
SELECT span_name,
PERCENTILE(duration, 0.5) AS p50,
PERCENTILE(duration, 0.99) AS p99,
AVG(error) AS error_rate
FROM <prefix>_otel_spans
WHERE trace_time > <window>
GROUP BY span_name
ORDER BY p99 DESC
The structural insight: trace-level latency averages multiple components together, hiding which one is the cause. Per-span aggregation surfaces that.
Canonical instance: Databricks AI Operations Center (2026-05-22)¶
"Native latency views show P50/P99 at the trace level. To go a layer deeper and see which tool is slow, we built a Tool Performance widget that breaks down latency (P50, P99) and error rates per individual tool in the agent (for example, retrieve_docs vs. generate_response). That tells us whether the LLM, a Genie tool call, or another step is the bottleneck, so we can pinpoint exactly where the user experience is degrading."
— Source: sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog
The Databricks AI Operations Center custom-dashboard example pairs this pattern with two more on the same UC OTel substrate:
- Custom Cost Analysis with Contract Pricing — token-usage by model + contract pricing → Estimated Cost per Trace (catches outliers like "a single complex query that costs $0.50 because of a retrieval loop").
- Component-Level Performance — this pattern.
Both exploit the same property: "the trace tables are still just Delta tables in Unity Catalog. You can build a custom AI/BI Dashboard against them and write standard SQL (with help from AI) to model whatever your team cares about."
Why span-level matters¶
A multi-tool agent's trace decomposes into:
root span (request)
├── LLM call 1 (planner) ── 500 ms
├── tool: retrieve_docs ── 8000 ms ◄── bottleneck
├── tool: generate_response (LLM 2) ── 1200 ms
└── tool: format_output ── 50 ms
Trace-level latency reports "P99 = ~9.8 s". That's correct but uninformative — it doesn't tell you the retrieval is the cause. Span-level reports retrieve_docs P99 = 8 s, immediately actionable.
When this pattern matters most¶
- Multi-tool agents with many components per request — the more components, the more trace-level latency obscures which one matters.
- Long-tail latency investigations — "why is P99 spiking?" is unanswerable at the trace level for any agent with ≥3 components.
- Cost-attribution at the component level — the Databricks custom-pricing dashboard joins span identity (which tool / which model) with token usage to attribute cost per component.
- Tool-replacement decisions — "if we swap retrieval implementations, what does P99 become?" requires component-level-attributed baseline.
Composition with other patterns¶
- Substrate from patterns/managed-otel-ingestion-direct-to-lakehouse — span-level data must be queryable; lakehouse-resident OTel spans deliver this.
- Sibling to patterns/bootstrap-eval-dataset-from-production-traces — same UC OTel tables; different consumer (latency dashboard vs eval dataset).
- Generalises beyond AI agents — any service with OTel-instrumented sub-components (microservices, request handlers with multiple downstream calls, etc.) can use the same shape.
Caveats¶
- Span-name discipline matters. If
retrieve_docsis sometimes namedretrieveand sometimes nameddocs_retrieval, aggregation fragments. Manual@MLflow.trace(name="retrieve_docs")decoration on entrypoints is a good practice. - Wall-clock vs CPU vs IO. Span duration is wall-clock; the bottleneck class (CPU-bound, IO-bound, network-bound) requires additional context.
- Concurrency obscures causality. Parallel spans complicate the "which step is slow" question; the pattern works best when spans are mostly sequential.
- Cardinality cost. For agents with many distinct tool names, the GROUP BY can be expensive on huge windows. Pre-aggregation (materialized views, hourly rollups) is the typical scale-out path.
- Native dashboards may be enough. The Databricks post explicitly says "For most teams, that's enough to monitor day-to-day agent health." This pattern is for teams whose use cases exceed the native views.
- Span-name leakage. Span names sometimes embed user-supplied identifiers (URLs, IDs); if so, cardinality balloons and aggregation breaks. Span-name normalisation discipline is a prerequisite.
Seen in¶
- sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog — canonical instance: the Tool Performance widget in Databricks' AI Operations Center example, breaking down latency P50 / P99 + error rate per tool over
<prefix>_otel_spans. Concrete examples named:retrieve_docsvsgenerate_response.
Related¶
- patterns/managed-otel-ingestion-direct-to-lakehouse — substrate pattern.
- patterns/bootstrap-eval-dataset-from-production-traces — sibling consumer of the same UC tables.
- patterns/telemetry-to-lakehouse — generalisation.
- concepts/lakehouse-native-observability — storage-substrate concept.
- concepts/observability — parent concept.
- systems/mlflow — host platform.
- systems/mlflow-otel-tracing — span-emission companion.
- systems/uc-otel-trace-tables — span storage.
- systems/zerobus-ingest — span ingest.
- systems/opentelemetry — wire format.
- companies/databricks