Skip to content

SYSTEM Cited by 1 source

GRAIL data fabric

GRAIL (Governance AI Run-time Links) is the patent-pending governance engine that powers LangGuard's runtime enforcement of agentic workflows. GRAIL captures every agent action as multidimensional trace data and constructs a live knowledge graph of workflow behavior and context. Every allow/deny/modify decision LangGuard returns is the result of a policy evaluation against this live graph, not a static rule table.

Purpose

In a conventional software system, policy evaluation can reference per-request attributes (user, role, resource) because the logic executing on those attributes is static and pre-registered. In an agentic workflow the logic is generated by the agent on the fly — the same request can fan out to different tool invocations and dataset accesses depending on what the LLM decides to do. Runtime governance therefore needs a substrate that captures the workflow-so-far — what this agent has already touched, across every system of record — so that the next action can be scored against that accumulated context.

GRAIL is the substrate that holds this workflow-so-far for all concurrently-running agentic workflows in an enterprise, in a form the LangGuard engine can query fast enough to stay on the critical path of every agent action.

(Source: sources/2026-04-27-databricks-inside-one-of-the-first-production-deployments-of-lakebase-langguard)

What it holds

  • Per-agent-action trace records — multidimensional capture of each tool invocation, data access, and model call an agent performs.
  • Workflow behavior graph — the linked structure over those trace records that represents "what has happened in this workflow so far" (agents involved, tools invoked, data touched, model calls made, credentials used).
  • Policy context — the relevant governance rules per system of record, joined/evaluated against workflow behavior at decision time.

Substrate

GRAIL runs on Lakebase (Databricks' serverless Postgres). Three Lakebase properties are load-bearing for this workload:

  1. Bursty trace-write + enforcement-read shape matches Lakebase's serverless autoscaling + scale-to-zero model.
  2. Millisecond read latency on hot indexed lookups via the compute-local caching layer keeps policy evaluation off the critical-path latency budget.
  3. Instant copy-on-write database branching lets the team clone production GRAIL data in seconds and test new governance policies against real trace data in isolation.

See systems/langguard for the full product + architecture framing; this page is the dedicated entity for the data-fabric component.

Mechanism gaps

The knowledge-graph claim is the whole mechanism disclosure available so far. All of the following are unknown from public material:

  • Graph schema — node types, edge types, temporal properties
  • Query patterns — traversal depth, index design, materialisation
  • Eviction / retention — how long traces live in hot storage
  • Materialised views — whether policy-relevant subgraphs are pre-computed
  • Consistency model — whether enforcement reads ever observe stale graph state
  • Cross-workflow correlation — whether GRAIL spans workflows or is partitioned per-workflow

Future LangGuard technical posts would be needed to canonicalise these.

Why "fabric"

The term data fabric (vs data store or knowledge graph service) frames GRAIL as the substrate over which agents, tools, and policies interact — not a product one queries directly. In practice GRAIL sits between the agent/workflow layer above and the Lakebase storage layer below. The LangGuard engine is the consumer; customers interact with policies + decisions, not with GRAIL directly.

Role in predictive governance

Because GRAIL's trace data lives natively in Lakebase and is directly readable by the rest of the Databricks Data Intelligence Platform, MLflow + Databricks AI + Model Serving can train anomaly-detection models directly on historical GRAIL data to produce behavioral baselines for each agent. Those baselines feed back into the LangGuard engine to score runtime behavior against historical norms — moving governance from reactive rule enforcement to proactive behavior-based control. Stated roadmap.

Seen in

Last updated · 434 distilled / 1,256 read