CONCEPT Cited by 1 source

Pydantic structured LLM output¶

Definition¶

Pydantic structured LLM output is the Python-ecosystem convention of declaring a Pydantic BaseModel for every LLM output shape and treating the model's JSON response as either parseable into that typed object or fully incorrect — no tolerance for free-form text slipping through.

The concept is the Python-idiomatic concrete form of concepts/structured-output-reliability: the schema is the contract; the validator is the boundary; every downstream consumer sees typed objects, never raw model text.

The pattern in practice¶

from pydantic import BaseModel
from enum import Enum

class Grade(str, Enum):
    PASS = "pass"
    REVISE = "revise"

class TranslationCandidate(BaseModel):
    text: str

class CandidateEvaluation(BaseModel):
    candidate_index: int
    grade: Grade
    explanation: str

class EvaluatorOutput(BaseModel):
    evaluations: list[CandidateEvaluation]
    best_candidate_index: int | None = None

# At the boundary:
result = EvaluatorOutput.model_validate_json(llm_response_text)
# -> either a typed object, or ValidationError is raised.

Downstream code consumes result.evaluations[0].grade as an enum value, never a string match. Schema drift (model adds a field) is caught at parse time, not during production execution.

Why this shape matters for agent pipelines¶

Multi-agent LLM pipelines pass intermediate outputs between agents — drafter → evaluator, planner → coder → verifier, etc. Each handoff is a potential format failure. Pydantic models:

Document the contract — the schema is the spec of what each agent emits.
Fail fast — malformed output surfaces at the boundary, not three agents downstream when a field is missing.
Enable structured-output APIs — OpenAI's response_format, Anthropic's tool use, and most other structured-output APIs accept JSON schemas that Pydantic generates natively (Model.model_json_schema()).
Compose with eval / logging — typed outputs serialize back to canonical JSON for snapshotting, replay, and regression harnesses.

Tradeoffs / gotchas¶

Schema changes are breaking. If the downstream consumer depends on a Pydantic shape and the schema changes, all cached LLM outputs + prompt-tuned few-shot examples need revisiting.
Small models break schemas more often than frontier models (concepts/structured-output-reliability). Pydantic validation catches the failure but doesn't fix it — mitigation still requires constrained decoding, few-shot examples, or model upgrade.
Pydantic is Python-specific. The same shape applies in TypeScript (zod), Rust (serde + schemars), Go (struct tags + encoding/json), etc — but the ergonomics and integration with LLM SDKs differ.
Enum values leak into the prompt. If Grade is an enum {PASS, REVISE}, the prompt must either include the enum values or rely on the LLM SDK's structured-output mode to constrain them. Silent mismatches ("Pass" vs "PASS") cause parse failures.
Free-form fields undermine the contract. A schema with explanation: str constrains shape but not content; the explanation string can still be hallucinated nonsense. Schema validation is about structure, not correctness.

Relationship to adjacent concepts¶

concepts/structured-output-reliability — the quality axis this pattern addresses; Pydantic validation is the Python implementation of the "malformed = fully incorrect" discipline.
systems/pydantic — the library itself.
concepts/few-shot-prompt-template — few-shot valid- JSON examples are the prompt-side reinforcement that reduces validation failure rate.

Seen in¶

sources/2026-02-19-lyft-scaling-localization-with-ai — canonical wiki instance. Lyft's AI localization pipeline uses Pydantic schemas as the contract between the Drafter and Evaluator agents. Documented shapes: DrafterOutput, TranslationCandidate, EvaluatorOutput, CandidateEvaluation, Grade enum, best_candidate_index: int. "This ensures type safety, reliable parsing, and clear contracts between Drafter and Evaluator."