Skip to content

SYSTEM Cited by 2 sources

Inference Tables (Databricks)

Inference Tables are Databricks' Unity Catalog-managed tables that capture the full payload of every model call routed through Unity AI Gateway — the exact prompt sent, the exact response returned, plus token counts and latency. The substrate is the lakehouse; retention is customer-controlled.

Definition (from the source)

"AI Gateway writes the full payload of every model call to inference tables: the exact prompt sent, the exact response returned, token counts and latency. Unity Catalog captures every access operation in audit logs, including which principal called what, from which agent and at what time. Both land in your lakehouse as tables, retainable on your terms." — Source: sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog

Why "full payload" is load-bearing (not just metrics + traces)

Most logging architectures sample, summarise, and short-retention. The post explicitly names this as a structural failure mode for AI workloads:

"Most logging architectures force a trade-off between completeness and cost, requiring you to sample, filter, and set short retention windows. Because Unity AI Gateway captures observability data in your lakehouse, you don't have to."

The argument:

  • Sampling loses the rare events most worth investigating.
  • Filtering strips the contextual fields needed for forensics.
  • Short retention prevents post-hoc audit when an incident is discovered weeks later.

Lakehouse storage breaks the cost half of the tradeoff: object storage is cheap enough that full request/response capture, retained on customer-defined schedules, is economically tractable.

Regulatory forcing function (named in the post)

"Emerging AI regulations require organizations to demonstrate what their AI systems did, what they were given, and what they produced."

The structural insight: summarised metrics are not regulatory evidence. "What was the agent given?" requires the verbatim prompt; "what did it produce?" requires the verbatim response. Inference Tables operationalise that requirement — the rows are replayable evidence, not aggregates.

Where Inference Tables fit in Pillar 2 (Data-centric AI governance)

Two parallel write paths land in the same lakehouse:

agent traffic
Unity AI Gateway
  ├── inference-table writer ── exact prompt + response + tokens + latency
  │                              ▼
  │                           [Unity Catalog](<./unity-catalog.md>)
  │                              ▼
  │                           ──► Delta tables in customer's lakehouse
  └── UC audit-log writer ── principal + agent + table accessed + operation + time
                              UC audit logs (also in lakehouse)

Both feeds are first-class lakehouse datasets queryable with the same SQL the customer uses for business data — the property that makes concepts/data-centric-ai-governance tractable.

Joinable-with-business-data property

"Because the audit data lives next to your business data, you can go further, joining agent behavior against business outcomes to understand not just what agents did, but whether it worked."

Concrete examples named:

  • "which agents accessed a specific service last week"
  • "how much each team is spending on inference"
  • "whether any agent touched credentials or PII"

The cost-vs-outcome example from Pillar 3 also derives from this property: an agent that costs $200 and generates $50K in qualified pipeline vs an agent that costs $200 querying stale data in a loop is a join between the inference-table cost and the business-pipeline outcome — only possible because both live in the same governed substrate.

Substrate for downstream agentic security (Lakewatch)

Per the source, the same Inference-Table + UC-audit-log substrate is the input to Lakewatch (Databricks' agentic SIEM):

"Lakewatch, Databricks' agentic SIEM built on the security lakehouse, takes this further still, turning the same audit trail into active security intelligence: AI-driven threat detection and response built on the lakehouse."

The structural property: inference tables double as input to security-intelligence agents. The same data that proves to auditors what agents did also powers detection of what attackers' agents are doing. This is the "defenders should also use agents" asymmetry the post argues for.

Schema (partially disclosed)

Column Source description
Prompt "the exact prompt sent"
Response "the exact response returned"
Token counts named
Latency named

Not disclosed in this post:

  • Caller identity / agent identity columns (likely present given the Pillar 1 dual-identity logging claim).
  • Multi-modal payload representation (image / audio / structured data).
  • Partition strategy.
  • Streaming-response handling (intermediate tokens vs final aggregate).
  • Whether blocked-by-Service-Policy or blocked-by-Guardrails requests still write to Inference Tables (audit completeness vs blast-containment trade-off).

Relation to existing wiki primitives

  • Extends patterns/telemetry-to-lakehouse — the pre-existing wiki canonicalisation of OTel-style metrics/traces landing in Delta tables. Inference Tables specialise the pattern to full request/response payloads — moving from metrics-and-traces to replayable I/O.
  • Companion to UC audit logs. Per the source, both feeds land "in your lakehouse as tables" — Inference Tables for the model-layer payload, UC audit logs for the data-access events.
  • Distinct from MLflow tracing. MLflow tracing covers the agent-framework layer (LangChain, LlamaIndex, AutoGen, OpenAI SDK, Anthropic SDK auto-instrumented); Inference Tables cover the gateway-payload layer (everything that flows through Unity AI Gateway's inference path). The two substrates are complementary, both landing in UC.
  • Canonicalises patterns/inference-payload-table-for-audit — the pattern of writing full model I/O to a governed lakehouse table.

Sibling substrate: UC OTel Trace Tables (2026-05-22)

The 2026-05-22 OTel-tracing launch ships a parallel UC-Delta lakehouse-resident audit substrate at a different granularity. Both feed the "governed lakehouse audit trail" posture but capture distinct facets:

  • Inference Tables — one row per model call; verbatim prompt + response + tokens + latency captured at the Unity AI Gateway proxy choke point.
  • UC OTel Trace Tables — one row per span within a trace; per-step execution path (tool calls, LLM calls, retrieval, etc.) captured at the agent OTel SDK (in-process instrumentation) via Zerobus Ingest.

The two compose: Inference Tables answer "what was sent to and received from the model" (full-payload audit at the gateway); OTel Trace Tables answer "what path did the agent take, and which step was the bottleneck" (execution-shape analysis at the agent). Both are first-class UC datasets queryable with the same SQL. Customers can join across them on identifiers (e.g. trace_id if propagated through the gateway) to align "what the agent did" (spans) with "what the model received and returned" (payloads). Source: sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog.

Seen in

Source

Last updated · 542 distilled / 1,571 read