Skip to content

SYSTEM Cited by 1 source

Unity AI Gateway — Guardrails

Guardrails are the model-layer inline content filter in Unity AI Gateway. Every model call is scanned on the inference critical path in both directions — inputs for PII and jailbreak attempts, outputs for hallucinations and sensitive content — before the response reaches the user. The layer runs inline on every request and fails closed.

Definition (from the source)

"At the model layer, guardrails inspect what flows through inference in real time, scanning inputs for PII and jailbreak attempts, checking outputs for hallucinations and sensitive content before they reach the user. They run inline on every request and fail closed." — Source: sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog

Four load-bearing properties

  1. Bidirectional. Inputs are scanned for PII + jailbreak attempts; outputs for hallucinations + sensitive content. Both directions are first-class — guardrails are not just an input filter or just an output filter.
  2. Inline. On the inference critical path, not async post-processing. The latency of guardrails is borne by every request; they are not a sampled audit.
  3. Per-request, every request. "They run inline on every request" — no sampling, no opt-in, no blast-radius-limited rollout.
  4. Fail-closed. When a guardrail's input is corrupt, ambiguous, or the classifier crashes, the request fails. The choice is explicit: guardrails are a security layer, where fail-closed is the safer default — even though fail-closed in availability terms means more user-facing errors. Contrast with availability-first modules that fail-open.

Position in Pillar 1 (Delegated access)

Guardrails are the third layer of the three-layer composition the post canonicalises (patterns/three-layer-agent-control):

Layer Granularity What it controls Why it's not enough alone
Permissions (OBO) Per-user-per-resource Who can call what Coarse — ignores runtime context
Service Policies Per-tool-call Whether this specific tool call should proceed Late — tool already chosen
Guardrails Per-request payload What content flows in and out Generic — knows nothing about identity

"In practice, these three layers work together: permissions control who can call what. Service Policies control whether a specific tool call should proceed in the context of a given request. Guardrails control what content flows in and out." — Source.

Threat coverage (named in the post)

Direction Threat class Mitigation goal
Input PII in user prompts Block sensitive data from reaching upstream model providers
Input Jailbreak attempts Detect adversarial prompts trying to bypass model alignment
Output Hallucinations Catch fabricated facts before they reach the user
Output Sensitive content Prevent restricted content (e.g. PII regurgitated from training data, internal data) from leaking out

The post does not disclose:

  • Classifier internals (transformer-based, rule-based, hybrid).
  • Precision / recall / false-positive rates.
  • Latency overhead per request.
  • Whether outputs are streamed or buffered before scanning.
  • How customers customise the threat taxonomy.
  • Whether the guardrails layer is shared across all upstream providers or specialised per-provider.
  • How guardrails interact with Inference Tables writes (do blocked requests still log?).

Why fail-closed is the explicit choice

The 2026-04-22 wiki canonicalisation of fail-open vs fail-closed frames the asymmetry: fail-closed is safer in security contexts (default-deny), dangerous in availability contexts (a single bad input takes out every request). For an LLM content guardrail, the security context dominates — a guardrail that fails-open "silently" is a guardrail that has been bypassed. Databricks names the choice explicitly ("and fail closed"), avoiding the implicit-fail-closed-by-accident trap that the 2025-11-18 Cloudflare outage canonicalised.

Relation to other wiki primitives

  • Sibling to Bedrock Guardrails — automated reasoning checks — both AWS and Databricks have a "Guardrails" product layer; AWS adds formal-verification-of-output-against-spec as a category, while Databricks' guardrails (as disclosed in this post) are content-pattern-based (PII / jailbreak / hallucination / sensitive) rather than formally-verified.
  • Sibling to systems/langguard (Databricks' own runtime-policy-enforcement layer for agent workflows) — different enforcement target. LangGuard intercepts agentic workflow actions (tool calls, state transitions); Guardrails intercepts model-layer content I/O. They are complementary, both inside the Databricks AI stack.
  • Sibling to concepts/ai-agent-guardrails (the CI/quality-gate discipline for AI-generated code) — different governance target. CI guardrails are static review of the code an agent writes; runtime content guardrails are dynamic scanning of the prompts and responses an agent processes. Both are necessary, neither subsumes the other.
  • Mediated by systems/unity-ai-gateway — Guardrails sit inside the gateway as the inline content filter for the model-layer half of the gateway's surface (Service Policies sit inside the gateway as the inline policy filter for the tool-layer half).

Seen in

Source

Last updated · 542 distilled / 1,571 read