Skip to content

PATTERN Cited by 3 sources

Durable event log as agent audit envelope

Pattern

Every agent interaction — prompt, input, context retrieval, tool call, output, and action — is captured as a first-class durable event on a streaming log. The log becomes the agent's audit envelope: the single durable, queryable artefact from which audit trail, data lineage, behavioural replay, SLO enforcement, and end-to-end decision tracing are all derived.

The pattern applies log-as-truth at the agent-interaction altitude: instead of the log being the truth for microservice state changes, it is the truth for agent decisions, and every governance surface (compliance, debugging, SLO, tracing) is a view over the log.

Canonical statement

From the 2025-10-28 Redpanda Governed autonomy post (Source: Redpanda 2025-10-28):

"The ADP treats every agent interaction as a first-class durable event: prompts, inputs, context retrieval, tool calls, outputs, and actions are captured for analysis, compliance, and replay. These events allow platform teams to reproduce behavior, diagnose drift, and prove outcomes."

"All powered by a durable, queryable event log to capture every agent decision, enable replay, enforce backpressure, and uphold exactly-once processing across tool chains. Streaming turns opaque agent behavior into governed, provable workflows."

What gets captured

Six event classes named in the canonical source:

  1. Prompts — the user-to-agent input (natural-language request).
  2. Inputs — structured parameters passed to the agent.
  3. Context retrieval — retrievals the agent performs (RAG queries, vector-DB lookups, documents pulled into context).
  4. Tool calls — every MCP tool invocation with parameters.
  5. Outputs — LLM-generated responses (including intermediate chain-of-thought if surfaced).
  6. Actions — externally-visible side effects (API writes, messages sent, resources provisioned).

The completeness of these six classes is what separates the pattern from "we log agent invocations to a file" — the audit envelope is only load-bearing if every state-changing moment is captured, not just the outer request/response.

Platform-team capabilities the envelope enables

  • Rewind and replay agent runs to debug or validate behaviors.
  • Enforce SLOs for latency, accuracy, and cost.
  • Trace agent decisions end-to-end — from input to action to outcome.
  • Reproduce behavior post-hoc.
  • Diagnose drift — compare current agent behavior against historical baseline captured in the log.
  • Prove outcomes — compliance audit against a queryable substrate.

These are all views over the same log, not separate systems. This is the load-bearing architectural claim of the pattern — consolidating audit / lineage / replay / tracing onto one substrate rather than maintaining N parallel systems.

Why streaming log as the substrate

Three properties of a streaming log make it the natural substrate for the audit envelope:

  • Append-only — audit invariant: events can be read / enriched / projected but not edited, so audit trail integrity is a structural property, not a discipline.
  • Durable + queryable — pairs well with Iceberg (topic → table) so audit queries run on warehouse substrate.
  • Exactly-once processing semantics ("uphold exactly-once processing across tool chains" in the Redpanda framing) — critical for not double-counting tool calls or double-emitting actions during replay.
  • Backpressure"enforce backpressure" means the audit substrate can push back when the downstream analytical surface is saturated, rather than dropping events.

Compare with the weak-audit-substrate alternatives:

  • Logs grep on application logs — destroys schema, breaks on structured replay, no retention guarantee.
  • Per-agent-framework audit store — fragments by framework, no cross-agent replay, schema drifts.
  • APM / tracing system — optimised for latency diagnosis, not compliance retention or replay fidelity.

Governance surfaces derived from the envelope

One log, N views:

The pattern's value scales with the number of views the envelope supports — a log that only powers audit is a narrow-win; a log that powers audit + lineage + replay + SLO + tracing is the load-bearing case.

Caveats

  • Determinism problem for replay. LLM tool chains are non-deterministic at two levels: (a) the model's output varies with temperature and sampling; (b) downstream API responses vary with time. The canonical source names "replay" without engaging this — in practice, replay-for-compliance requires either deterministic models (temperature=0, fixed seed) or acceptance of behavioural-equivalence-class replay rather than byte-equivalent replay. See patterns/snapshot-replay-agent-evaluation for the mechanism axis.
  • Exactly-once across tool chains is a strong claim. Tool chains typically involve non-idempotent external APIs (Salesforce writes, GitHub commits, emails). Exactly-once processing requires idempotency keys, sagas, or compensations — the canonical source asserts the property without walking the mechanism. See concepts/exactly-once-semantics for the general concept.
  • Retention cost. Capturing every prompt + context retrieval
  • tool call + output at enterprise agent volume produces large event streams. Tiered-storage (patterns/tiered-storage-to-object-store) is the likely compression mechanism; not walked in the canonical source.
  • PII in prompts / context / outputs. The audit envelope captures user prompts and context retrievals, which may contain PII or sensitive data. Retention + access-control on the envelope itself is a second-order governance problem the canonical source does not engage.
  • Schema evolution. Six event classes at launch; what happens when new classes emerge (reasoning traces, tool-call retries, multi-agent handoffs)? No schema-evolution discipline named.
  • Query-performance envelope. "Queryable event log" without query-planner disclosure — large-volume audit queries over streaming logs typically pivot to warehouse substrate (Iceberg / OLAP engine like Oxla); the canonical source positions Oxla as the query-side of ADP but doesn't bind Oxla-over-log-as-audit- substrate explicitly.
  • No substitute for in-model governance. Audit + replay tell you what the agent did; they don't prevent the agent from doing it in the first place. The pattern complements AAC's pre-I/O policy check, rather than substituting for it.

Contrasts and siblings

  • concepts/audit-trail is the canonical concept this pattern instantiates at the agent altitude. Flagship's 2026-04-17 field-level-diff audit trail for feature flags is the peer pattern at the config-change altitude; Fly.io's Macaroon OpenSearch audit trail is the peer at the token-operation altitude.
  • concepts/durable-execution is the sibling concept at workflow altitude — Temporal-style durable workflows keep state durable so execution survives crashes. The audit envelope keeps interactions durable so replay / audit / lineage are possible. Overlap but not identity.
  • patterns/mcp-as-centralized-integration-proxy is the connectivity-side peer pattern in the ADP substrate — the proxy gives one choke-point for auditing, and the durable event log is the substrate the choke-point writes to. The two patterns compose: the proxy is the producer of audit events, the log is the substrate of the envelope.

Seen in

  • sources/2026-02-10-redpanda-how-to-safely-deploy-agentic-ai-in-the-enterprise — Akidau talk-recap extends the pattern with a metadata-only-audit-insufficient framing: verbatim "with agents you need to be able to audit what the request was, and what the agent did in response to the request. You can't make inferences without having the full dataset". Canonicalises the distinction between classical systems audit (request-metadata, byte counts, timestamps) and agent audit (full inputs + outputs + tool calls). Reinforces streaming log as the substrate on "high throughput, low latency, and durable logs" grounds.
  • sources/2025-10-28-redpanda-governed-autonomy-the-path-to-enterprise-agentic-ai — canonical wiki introduction. Names six event classes (prompt
  • input + context retrieval + tool call + output + action) + five platform-team capabilities (rewind-replay + SLO + trace + reproduce + diagnose-drift + prove-outcomes) + streaming-log as the substrate.
  • sources/2026-04-14-redpanda-openclaw-is-not-for-enterprise-scaleTranscripts + A/B agent evaluation as a new use case for the audit envelope. Redpanda 2026-04-14 Openclaw is not for enterprise scale post positions the audit envelope as component #2 of the four- component agent production stack (Gateway + Audit log + Token vault + Sandboxed compute). Names "transcripts" as the content shape alongside "every tool call and LLM invocation" and adds the agentic- performance-review use case: "You can run different versions of agents, for example, giving similar agents different sets of tools to accomplish a job, then monitor and compare their performance." This extends the pattern's value beyond compliance / replay / lineage to include iterative agent development as a first-class consumer of the audit envelope. Canonicalises the architectural role of the envelope at the product-minimum altitude.
Last updated · 470 distilled / 1,213 read