Skip to content

SYSTEM Cited by 1 source

LangGuard

LangGuard is a runtime enforcement layer for enterprise agentic AI workflows. It intercepts every agent tool invocation, dataset access, and model call and — using its patent-pending GRAIL (Governance AI Run-time Links) data fabric — evaluates that action against policy before it executes, returning allow / deny / modify synchronously on the critical path of the agent's operation. It runs on Databricks Lakebase as its operational system of record, and is profiled by Databricks as one of the first startups building production infrastructure on Lakebase.

Problem framing

The wiki-canonical statement of the problem is concepts/agentic-workflow-governance. In short: conventional software has predictable logic and conventional security monitors can audit it; autonomous agents generate their own logic on the fly, invoke tools + access data in ways difficult to audit after the fact, and operate across multi-agent workflows where one misconfigured permission can cascade into a significant incident. A single enterprise workflow may involve:

  • Tens of coordinated agents
  • Hundreds of tool invocations
  • Multiple foundation models
  • 15+ enterprise Systems of Record — ServiceNow, IAM/IDP platforms, Salesforce, Workday, Wiz, CrowdStrike, TalkDesk, MCP Gateways, API Gateways

LangGuard's product contract is to govern all of this at workflow speed — without introducing meaningful latency to agent execution.

(Source: sources/2026-04-27-databricks-inside-one-of-the-first-production-deployments-of-lakebase-langguard)

Architecture

GRAIL data fabric

GRAIL captures every agent action as multidimensional trace data and constructs a live knowledge graph of workflow behavior + context. Policy evaluation happens against this live graph, not against a static rule table. When an agent attempts an action, the engine walks the graph to answer: given everything this workflow has touched so far, across every system it has access to, is this next action within policy? The result — allow / deny / modify — returns before the action fires.

The mechanism beyond this framing is undisclosed (schema, edge types, eviction, materialisation all hidden). See systems/grail-data-fabric for the dedicated entity page; this LangGuard page treats GRAIL as the engine-internal substrate.

Lakebase as operational system of record

LangGuard writes trace events to Lakebase continuously and performs low-latency reads for governance policy lookups + contextual queries. Three Lakebase properties drive the choice:

1. Serverless autoscaling + scale-to-zero between bursts. Agent behavior is "notoriously bursty" — dormant for hours, then hundreds of trace writes + enforcement reads in seconds. Lakebase dynamically provisions compute on burst arrival and shuts down completely when activity stops. Because durable state lives in the storage layer (Pageserver + Safekeeper) rather than the compute node, spinning up a new compute instance requires no data movement — it attaches to existing database history and serves queries immediately. This is the operational payoff of concepts/compute-storage-separation specifically for burst workloads: no cold-start penalty on the data side, only on the compute-VM side.

2. Millisecond reads via compute-local cache. The natural concern with any disaggregated database is read latency. Lakebase keeps hot data close to compute via a caching layer between the two tiers. LangGuard's expected working set — "tight indexed lookups against GRAIL context and policy tables" — fits comfortably in compute-local memory, giving enforcement decisions at workflow speed. This matters because LangGuard's evaluation is on the critical path of every agent action.

3. Instant database branching via copy-on-write. When a branch is created, no data is physically copied. The branch diverges from current database state using copy-on-write semantics, consuming storage only for new or modified data. This lets developers create an isolated, exact replica of production trace data in seconds and test new governance policies against real agent behavior without risking the live environment. See patterns/policy-testing-via-database-branching.

Relationship to Databricks governance substrate

LangGuard is positioned as the workflow-level runtime enforcement layer that extends platform-level controls into every step of agent execution. The platform-level substrate is Unity Catalog + Databricks AI Gateway (system of record for data, models, access policies). Because LangGuard's operational trace data lives natively in Lakebase, it is immediately available to the broader Databricks Data Intelligence Platform for analytics + AI without additional ETL: Databricks AI, Model Serving, and MLflow can train + deploy anomaly detection models directly on GRAIL trace data. See patterns/runtime-governance-enforcement-layer for the shape.

QRadar lineage

The LangGuard team previously built IBM QRadar — a multi-time Gartner Magic Quadrant leader and one of the world's most widely deployed enterprise SIEM platforms. QRadar ingests + correlates petabytes of security telemetry per day under strict latency + reliability requirements. Verbatim lesson from that experience:

"Database architecture is destiny. … Operational security data that arrives in unpredictable, high-intensity bursts, where every millisecond of decision latency matters and idle infrastructure spend is unacceptable. Traditional databases that couple compute and storage force you to provision for peak load and pay for that capacity around the clock. Lakebase's serverless model, which fully decouples compute from storage and scales to zero between bursts, was the answer we had always needed but didn't have access to when we were building QRadar."

This is the canonical articulation on the wiki so far of the bursty security-telemetry → serverless-OLTP fit. The QRadar experience is also what makes LangGuard the natural substrate for agentic-workflow governance specifically — agentic traces have the same operational shape as security telemetry (unpredictable bursts, latency-sensitive decisions, correlation-heavy queries).

Predictive governance (stated next step)

Today LangGuard enforces established policies at runtime. The announced next evolution is predictive: training behavioral models on historical GRAIL trace data (concepts/agent-behavioral-baseline) to detect anomalous agent behavior before it manifests as a policy violation. Because trace data already lives in the Databricks ecosystem, the move from enforcement to prediction "without building separate ETL pipelines or standing up a second analytical platform" is a stated design payoff. Roadmap, not shipped.

Caveats

  • GRAIL mechanism undisclosed — schema, edge types, query patterns, eviction policy all hidden behind the "knowledge graph" brand.
  • No concrete numbers — no enforcement decision latency p50/p99, no burst-scale QPS, no cold-start wall-clock, no branch-creation wall-clock.
  • Vendor-authored joint case study — co-signed narrative, not third-party technical retrospective.
  • Predictive governance is roadmap, not shipped.

Seen in

Last updated · 434 distilled / 1,256 read