Skip to content

PATTERN Cited by 1 source

Inference payload table for audit

Pattern

Capture the full request and response payload of every model call flowing through an AI gateway as rows in a governed lakehouse-resident table — keyed and partitioned for SQL queryability, retained on customer-defined schedules. The table doubles as regulatory evidence (what the agent was given, what it produced) and as forensic / detection input (joinable with business outcomes, accessible to detection systems).

Why "full payload" instead of metrics + traces

Conventional observability stores summarised signals: token counts, latency histograms, error rates, traces. Inference payload tables store the verbatim prompt and response. Two structural reasons:

  1. Regulatory evidence is verbatim, not summarised. "Emerging AI regulations require organizations to demonstrate what their AI systems did, what they were given, and what they produced. AI Gateway writes the full payload of every model call to inference tables: the exact prompt sent, the exact response returned, token counts and latency." — Source: sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog. Aggregates can't answer "show me the exact prompt that produced this output."
  2. Forensic value is in the content. When an agent gives a wrong answer, the question is "what did it see?" — not "how long did it take?". Without the verbatim prompt, the trail dead-ends.

Why a lakehouse table (not an APM or log warehouse)

The post's argument:

"Most logging architectures force a trade-off between completeness and cost, requiring you to sample, filter, and set short retention windows. Because Unity AI Gateway captures observability data in your lakehouse, you don't have to." — Source.

Object-storage-backed table formats (Delta Lake, Iceberg) make full request/response capture, retained for regulatory-relevant durations, economically tractable in a way row-stored APM systems aren't. The lakehouse property buys:

  • Cheap durable retention. Object storage is the cheapest tier of cloud storage by an order of magnitude.
  • Standard SQL access. Investigators query with the same dialect they use for business data — no separate query DSL.
  • Joinable with business data. Inference rows can join with HR / pipeline / order tables to answer cross-cutting questions ("is this agent worth what it costs?").
  • Same governance posture as everything else. UC ABAC, row-filters, column-masks apply to inference tables themselves — protecting customer data captured in prompts.

Architectural shape

agent traffic → AI Gateway
                   │ inline write (every request, every response)
              ┌───────────────────────┐
              │ Inference Table        │
              │ (UC-managed Delta)     │
              │ ─ exact_prompt         │
              │ ─ exact_response       │
              │ ─ token_count          │
              │ ─ latency              │
              │ ─ caller_identity      │
              │ ─ agent_identity       │
              │ ─ timestamp            │
              └───────────────────────┘
                   │ retain on customer schedule
       consumed by:
       ─ compliance / audit queries (SQL)
       ─ business-outcome joins (cost-vs-value)
       ─ Lakewatch agentic SIEM (threat detection)
       ─ data-quality-driven debugging (join freshness)

Why "every request, every response" is load-bearing

Sampling breaks the regulatory-evidence property: "show me the exact prompt" fails if the prompt was sampled out. The inference payload table pattern is all-or-nothing — it loses its core value if it's reduced to "a sample of inference traffic".

The economic argument (lakehouse-backed retention is cheap) is what makes every request economically tractable.

What's stored vs not

The post discloses what is stored (prompt, response, tokens, latency) but not the full schema or how the table handles:

  • Streaming responses — buffer-then-write or token-by-token append.
  • Multi-modal payloads — image / audio / structured-data prompts and responses.
  • Blocked requests — does a request blocked by Service Policy or Guardrails still write? (The audit-completeness vs blast-containment trade-off is undisclosed.)
  • PII in payload — prompts may contain PII; how is the inference table itself protected? Likely by the same UC ABAC + classification machinery that protects business data, but unconfirmed in this source.

Where this pattern fits in the broader telemetry hierarchy

Pre-existing wiki canonicalisation:

  • patterns/telemetry-to-lakehouse — the parent pattern: OTel-style metrics + traces landing in Delta tables for joinable observability.
  • Inference payload table for audit specialises that to full request/response payloads, not just metrics + traces.

The two patterns compose: a Databricks customer running Unity AI Gateway gets both — OTel telemetry (metrics + traces) lands in UC-managed Delta via OpenTelemetry ingestion, and full inference payloads land in Inference Tables.

Sibling primitives

  • MLflow tracing (systems/mlflow) — auto-instruments agent frameworks, captures agent-framework-layer spans (LangChain, LlamaIndex, AutoGen, OpenAI SDK, Anthropic SDK). Lands in UC tables. Different layer of the same audit story: agent-side spans vs gateway-side payloads.
  • CloudTrail-style audit logs (systems/aws-cloudtrail) — captures API-call metadata + caller identity, but not request/response bodies for most APIs. Inference payload tables are a categorical extension to the bodies, motivated by the regulatory-evidence requirement.
  • Sampling-and-replay observability — most APM products. Loses regulatory-evidence value but has lower cost. The inference payload table pattern accepts the higher storage cost for the regulatory benefit.

Two-purpose substrate

The same table serves:

  1. Compliance. "Retainable on your terms" — customer-controlled retention satisfies regulatory requirements.
  2. Active security intelligence. Lakewatch (Databricks' agentic SIEM) consumes the same audit trail: "turning the same audit trail into active security intelligence: AI-driven threat detection and response built on the lakehouse." The substrate-reuse property is load-bearing — you don't run two pipelines, you run one substrate with two consumers.

Seen in

Source

Last updated · 542 distilled / 1,571 read