CONCEPT Cited by 1 source
Inline LLM content guardrail¶
Definition¶
An inline LLM content guardrail is a runtime control surface that scans the content of every LLM request and response on the inference critical path (not async post-processing) for policy-defined classes of unsafe content, and fails closed on detection. The five load-bearing properties are:
- Bidirectional — both inputs (the prompt) and outputs (the response) are scanned.
- Inline — on the inference critical path, not a sampled audit pipeline running offline.
- Per-request — every request, not sampled.
- Fail-closed — when the classifier is uncertain, ambiguous, or crashes, the request is blocked rather than served.
- Content-pattern — the unit of analysis is the request/response content, not the caller identity, the tool name, or the resource.
Canonical statement on the wiki¶
"At the model layer, guardrails inspect what flows through inference in real time, scanning inputs for PII and jailbreak attempts, checking outputs for hallucinations and sensitive content before they reach the user. They run inline on every request and fail closed." — Source: sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog
Threat-class taxonomy (from the source)¶
| Direction | Threat class | What it catches |
|---|---|---|
| Input | PII | User prompt embedding sensitive personal data that shouldn't reach upstream model providers |
| Input | Jailbreak attempts | Adversarial prompts engineered to bypass model alignment |
| Output | Hallucinations | Fabricated facts the model generated without grounding |
| Output | Sensitive content | Restricted content (e.g. PII regurgitated from training data, internal data) leaking out |
The taxonomy is post-coding-agent. Earlier wiki canonicalisations of concepts/ai-agent-guardrails focused on CI/quality gates for AI-generated code (test coverage, type checking, lint) — that's a static review control. Inline LLM content guardrails are the dynamic runtime control. Both are necessary; neither subsumes the other.
Why fail-closed (and why it has to be explicit)¶
The 2026-04-22 wiki canonicalisation of concepts/fail-open-vs-fail-closed frames the asymmetry: security wants fail-closed, availability wants fail-open. For an LLM content guardrail, the security context dominates — a guardrail that fails-open silently is a guardrail that has been bypassed by every request the classifier couldn't process.
Databricks names the choice explicitly ("and fail closed"), avoiding the implicit-fail-closed-by-accident trap that the 2025-11-18 Cloudflare outage canonicalised — where a .unwrap() panic happened to be fail-closed but the architecture never explicitly chose it. Explicit fail-closed in the guardrail layer is a deliberate availability trade-off accepting more user-facing errors as the cost of stronger leak prevention.
Why "inline" instead of "audit"¶
Inline + per-request is structurally different from a sampling-and-replay audit:
- Sampling-and-replay: a pipeline reads N% of inference logs after the fact, runs classifiers, and flags violations. Damage already done; the unsafe content already reached the user.
- Inline: the classifier sits on the request path; the request does not return to the user until the output classifier passes.
The trade-off is latency. Inline classifiers add per-request overhead; sampling-and-replay adds none but provides only forensic value. Inline is the choice when "before they reach the user" is the load-bearing requirement.
Two-place controls in the three-layer composition¶
Inline content guardrails are the third layer of the patterns/three-layer-agent-control composition. The other two layers (permissions + Service Policies) are identity- and tool-keyed; the guardrail layer is content-keyed:
| Layer | Key | What it sees |
|---|---|---|
| Permissions (OBO) | Identity | User/agent ID, target resource |
| Service Policies | Tool name + args | The synthesised tool call, its arguments |
| Inline content guardrail | Request/response payload | The prompt content, the response content |
Each key class catches a different threat. An OBO check can't see the prompt content. A Service Policy can't see what the model is about to generate. Only the content guardrail closes the leak path that opens when an authorised agent makes an authorised tool call but the content of the request is dangerous.
Distinction from related primitives¶
- vs Bedrock Guardrails — Automated Reasoning Checks — AWS adds formal-verification-of-output-against-spec as a category of guardrail. Inline content guardrails as defined here are content-pattern-based (regex / classifier / model). Both are subtypes of output-checking guardrail, but only the AR-checks variant produces formal correctness verdicts.
- vs systems/langguard — both Databricks systems. LangGuard intercepts agentic workflow actions (tool calls, state transitions); inline content guardrails intercept model-layer content I/O. Different points of enforcement, complementary.
- vs concepts/runtime-policy-enforcement — the parent concept. Inline content guardrails are content-keyed runtime policy enforcement; the parent concept covers any synchronous-allow/deny gate before execution regardless of key.
- vs CI guardrails for AI-generated code (concepts/ai-agent-guardrails) — different lifecycle stage. CI guardrails are static review of code an agent produced (run once, at PR time); inline content guardrails are dynamic runtime checks of every prompt/response (run continuously, on every request).
What's not yet covered in the wiki¶
- Customisation surface — how customers add their own threat classes beyond PII / jailbreak / hallucination / sensitive content.
- False-positive handling — when a guardrail blocks a legitimate request, is there a customer override? An audit trail of overrides?
- Performance characteristics — latency overhead per request, classifier model size, whether streaming responses are buffered or scanned token-by-token.
Seen in¶
- sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog — first wiki canonicalisation as Pillar 1 layer 3 (2026-05-20). Concrete instance: systems/unity-ai-gateway-guardrails.
Source¶
- Originating post: https://www.databricks.com/blog/governing-ai-agents-scale-unity-catalog
Related¶
- systems/unity-ai-gateway-guardrails — canonical concrete instance.
- systems/bedrock-guardrails-automated-reasoning-checks — sibling formal-verification variant.
- concepts/fail-open-vs-fail-closed — the design choice the principle makes explicit.
- concepts/ai-agent-guardrails — sibling but different (CI/quality-gate) discipline.
- concepts/runtime-policy-enforcement — the parent concept.
- patterns/three-layer-agent-control — the composition this layer fits into.