Skip to content

CONCEPT Cited by 7 sources

Audit trail

An audit trail is a durable, queryable record of every state-changing operation performed against a system — who changed what, when, and to what new value — independent of the system's primary storage. A good audit trail is first-class, not a logs-grep afterthought: it has a defined schema, retention policy, and query API.

Field-level diff as the right granularity

Flagship's audit-trail framing, from the 2026-04-17 launch post:

"A full audit trail. Every flag change is recorded with field-level diffs, so you know who changed what and when."

Field-level diff is the load-bearing shape: not "Alice updated flag X at 14:02" (useless for debugging) but "Alice changed rules[1].conditions[0].value from enterprise to pro at 14:02 UTC". The diff is the answer to the forensic question.

The pattern is positioned against the hardcoded-flag anti- pattern:

"There's no audit trail — when something breaks, you're searching git blame to figure out who toggled what."

git blame is a weak audit trail because it only captures code-embedded toggles — config edits on a dashboard disappear entirely.

What an audit trail is for

Three distinct audiences, each with different query shapes:

  • Incident forensics"what was the last change to the bot-management feature file before the 2025-11-18 outage?" (see [[sources/2025-11-18-cloudflare-outage-on- november-18-2025]] where absence of fine-grained change attribution lengthened the debugging loop.)
  • Compliance"show every time a user's access was modified in the last 12 months, with actor + timestamp + before/after."
  • Operational review"who flipped the flag, and what did they change it from/to — is the new value the one the change ticket asked for?"

Each audience implies different retention policies (months vs. years vs. permanent), different query surfaces (search, SQL, time-range filter), and different PII posture.

Architectural placement

Common implementations, in order of increasing rigour:

  • Logs-only (X updated Y). Cheapest, weakest query surface, lossy.
  • Change-log table in the primary DB. Co-located; risks being corrupted by the same bug that corrupts the primary data.
  • Separate append-only store. Durable, independent of primary-DB corruption; examples:
  • Durable Object- embedded changelog alongside the per-app config (see Flagship — the DO "serves as the source of truth for that app's flag configuration and changelog").
  • Permanently-retained OpenSearch index for token operations (see concepts/audit-trail-in-opensearch — Fly.io's Macaroon auth trail).
  • Separate service with its own auth tier. Highest bar; read paths are auditor-only, primary-DB operators don't have write access.

Caveats not named in the Flagship post

  • Retention window"field-level diffs" is the write shape; the post doesn't name how long they're kept.
  • Query API — dashboard-viewable and diff-renderable, but no disclosed export / API for downstream SIEM integration.
  • PII in context attributes — flag evaluation context can contain user attributes; whether those land in the audit trail (and how) isn't walked through.

Seen in

  • sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalogCanonical full-payload-audit-as-regulatory-evidence framing for AI agents. Databricks' four-pillars post canonicalises Inference Tables"the exact prompt sent, the exact response returned, token counts and latency" — written to UC-managed Delta tables in the lakehouse, "retainable on your terms". Frames the conventional logging-architecture "trade-off between completeness and cost" (sample / filter / short-retention) as structurally insufficient for AI: "Emerging AI regulations require organizations to demonstrate what their AI systems did, what they were given, and what they produced." The substrate doubles as input to Lakewatch (agentic SIEM) — same audit data, two consumers (compliance + active threat detection). First wiki canonicalisation of patterns/inference-payload-table-for-audit. Also names dual-identity audit logging as a load-bearing OBO requirement: "every action is logged against both identities: the real user who triggered the request and the agent that acted on their behalf."
  • sources/2026-04-17-cloudflare-introducing-flagship-feature-flags-built-for-the-age-of-ai — canonical wiki instance for feature-flag field-level audit trails; Flagship positions the changelog as a first-class DO-resident artefact alongside the flag config.
  • sources/2025-03-27-flyio-operationalizing-macaroons — permanently-retained OpenSearch audit trail for all token operations; because virtually all Fly.io platform operations are token-mediated, this is effectively a platform-wide audit trail (see concepts/audit-trail-in-opensearch).
  • sources/2025-10-28-redpanda-governed-autonomy-the-path-to-enterprise-agentic-ai — canonical wiki instance for streaming-log-backed agent-interaction audit envelope. Every agent interaction (prompt + input + context retrieval + tool call + output + action) captured as first-class durable events on a streaming log; enables replay, lineage, SLO-enforcement, and end-to-end decision tracing. See patterns/durable-event-log-as-agent-audit-envelope for the pattern.
  • sources/2026-02-10-redpanda-how-to-safely-deploy-agentic-ai-in-the-enterprise — Akidau talk-recap canonicalises the metadata-only-audit- insufficient framing for agents: verbatim "Historically, we've chosen the more cost-effective option for auditing: logging metadata requests rather than entire bytes of data (i.e., User Y read Z number of bytes on such-and-such day). But with agents you need to be able to audit what the request was, and what the agent did in response to the request. You can't make inferences without having the full dataset." The shift from byte-count-and- timestamp audit to full-input-and-output audit is structurally new at the agent altitude.
  • sources/2026-04-14-redpanda-openclaw-is-not-for-enterprise-scaleTranscripts + reasoning-chain as the why-and-how audit shape. Redpanda 2026-04-14 Openclaw is not for enterprise scale post canonicalises why and how, not just what as the load-bearing audit invariant for agents. Verbatim: "You want to know why and how the agent did a thing, not just what it did. Transcripts give you the ability to not only govern the actions and tools your agents have, but also enable agentic performance reviews." Full transcripts capture "inputs, outputs, tool calls, token usage, and the agent's reasoning chain." The post adds agentic performance review as a new first-class audit use case: "run different versions of agents [...] giving similar agents different sets of tools to accomplish a job, then monitor and compare their performance." Canonicalises audit as component #2 of the four- component agent production stack and the substrate for A/B agent evaluation via patterns/snapshot-replay-agent-evaluation.
  • sources/2026-04-23-aws-modernizing-kyc-with-aws-serverless-solutions-and-agentic-ai — canonical wiki instance of audit-trail-as-regulatory- compliance-substrate for an agentic KYC system. Audit trail is split across layers: CloudTrail for API-level events, CloudWatch for operational telemetry, and application-level records for business-semantic events (KYC decisions with confidence scores + attestation trails). The Compliance & Risk sub-agent "generates compliance attestations with audit trails for regulatory examinations" — i.e. the audit trail is paired with explainable AI decisions as a compound output of every sub-agent, not bolted on at the end.
Last updated · 542 distilled / 1,571 read