Skip to content

PATTERN Cited by 1 source

Structured output grammar for valid plans

Shape

When an LLM agent must produce a structured object with correctness constraints beyond well-formedness — e.g. a query plan, a scheduling decision, a config edit — express those constraints as a grammar passed to the model's structured- output decoder. The grammar admits only valid objects before generation, eliminating the failure mode where the agent produces syntactically-valid but semantically-invalid output.

This is distinct from, and stronger than, reliability-style JSON schema validation:

Axis Structured-output reliability Grammar-for-validity
Goal "Parseable JSON" "Parseable JSON that is also a valid plan"
Constraint type Syntax + type Syntax + type + semantic invariants
Enforcement Decoder produces JSON; app validates Decoder produces only valid objects by construction
Failure mode solved Malformed JSON → dropped examples Valid JSON but invalid plan → wasted rollout

Canonical instance

Databricks' join-order agent (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization):

"Each tool call generates a join ordering using structured model outputs, which forces the model's output to match a grammar we specify to only admit valid join reorderings."

A join-order is semantically valid iff:

  • Every table in the query appears exactly once.
  • The binary-tree structure is well-formed.
  • Associativity/commutativity constraints are preserved.

A free-form generation would frequently produce orderings that miss a table, duplicate one, or reference a non-existent table — all valid JSON, all useless. The grammar pre-emptively eliminates these, so every rollout lands on a semantically- legal plan the execution engine can actually run.

When this fits

Condition Why
Output is a structured object (not prose) Grammar-constrained decoding only makes sense for structured output
Semantic validity is grammar-expressible Not every correctness property is context-free; e.g. "every table appears exactly once" can be encoded by careful state-machine grammar
Validity failures are costly Each invalid output wastes a rollout (or worse, corrupts a downstream system)
Grammar can be inferred from schema You can auto-generate the grammar from a SQL schema + join graph, not hand-maintain it

When it doesn't fit

  • Outputs are prose with style constraints. Natural-language outputs rarely benefit from grammar constraints; style is better shaped by prompting.
  • Validity is defined by runtime behaviour. E.g. "the plan must not OOM" — not grammar-expressible; falls back to execute-and-check (which is what the outer pattern does anyway).
  • The grammar would be as complex as the full semantics. If the grammar approaches the complexity of a type checker, it's easier to post-hoc validate and retry.

Implementation notes

  • Modern frontier-model APIs (OpenAI, Anthropic, Gemini) expose grammar-constrained or schema-constrained decoding surfaces (JSON Schema, Pydantic, context-free grammars). Which surface you use depends on the validity property's complexity.
  • Token-level constraints can impact quality: too-tight a grammar may force the model into paths it can't reason well about. Empirically test against free-generation-plus- validation as a baseline.
  • Context-free grammars can encode table-uniqueness via explicit enumeration of remaining tables at each step — but this is quadratic in the grammar size. Alternatives: generate a sequence, then reject-and-retry if invalid.

Composition

Pattern Relationship
patterns/llm-agent-offline-query-plan-tuner Outer pattern — this is the validity leg
concepts/structured-output-reliability Sibling; same decoder feature used for parseability, not validity
patterns/tool-call-loop-minimal-agent Natural fit — narrow-tool agents especially benefit since validity errors waste tool calls

Seen in

Last updated · 510 distilled / 1,221 read