CONCEPT Cited by 1 source
Pydantic structured LLM output¶
Definition¶
Pydantic structured LLM output is the Python-ecosystem
convention of declaring a Pydantic BaseModel for every LLM
output shape and treating the model's JSON response as either
parseable into that typed object or fully incorrect — no
tolerance for free-form text slipping through.
The concept is the Python-idiomatic concrete form of concepts/structured-output-reliability: the schema is the contract; the validator is the boundary; every downstream consumer sees typed objects, never raw model text.
The pattern in practice¶
from pydantic import BaseModel
from enum import Enum
class Grade(str, Enum):
PASS = "pass"
REVISE = "revise"
class TranslationCandidate(BaseModel):
text: str
class CandidateEvaluation(BaseModel):
candidate_index: int
grade: Grade
explanation: str
class EvaluatorOutput(BaseModel):
evaluations: list[CandidateEvaluation]
best_candidate_index: int | None = None
# At the boundary:
result = EvaluatorOutput.model_validate_json(llm_response_text)
# -> either a typed object, or ValidationError is raised.
Downstream code consumes result.evaluations[0].grade as an enum
value, never a string match. Schema drift (model adds a field)
is caught at parse time, not during production execution.
Why this shape matters for agent pipelines¶
Multi-agent LLM pipelines pass intermediate outputs between agents — drafter → evaluator, planner → coder → verifier, etc. Each handoff is a potential format failure. Pydantic models:
- Document the contract — the schema is the spec of what each agent emits.
- Fail fast — malformed output surfaces at the boundary, not three agents downstream when a field is missing.
- Enable structured-output APIs — OpenAI's
response_format, Anthropic's tool use, and most other structured-output APIs accept JSON schemas that Pydantic generates natively (Model.model_json_schema()). - Compose with eval / logging — typed outputs serialize back to canonical JSON for snapshotting, replay, and regression harnesses.
Tradeoffs / gotchas¶
- Schema changes are breaking. If the downstream consumer depends on a Pydantic shape and the schema changes, all cached LLM outputs + prompt-tuned few-shot examples need revisiting.
- Small models break schemas more often than frontier models (concepts/structured-output-reliability). Pydantic validation catches the failure but doesn't fix it — mitigation still requires constrained decoding, few-shot examples, or model upgrade.
- Pydantic is Python-specific. The same shape applies in
TypeScript (
zod), Rust (serde+schemars), Go (struct tags +encoding/json), etc — but the ergonomics and integration with LLM SDKs differ. - Enum values leak into the prompt. If
Gradeis an enum{PASS, REVISE}, the prompt must either include the enum values or rely on the LLM SDK's structured-output mode to constrain them. Silent mismatches ("Pass"vs"PASS") cause parse failures. - Free-form fields undermine the contract. A schema with
explanation: strconstrains shape but not content; the explanation string can still be hallucinated nonsense. Schema validation is about structure, not correctness.
Relationship to adjacent concepts¶
- concepts/structured-output-reliability — the quality axis this pattern addresses; Pydantic validation is the Python implementation of the "malformed = fully incorrect" discipline.
- systems/pydantic — the library itself.
- concepts/few-shot-prompt-template — few-shot valid- JSON examples are the prompt-side reinforcement that reduces validation failure rate.
Seen in¶
- sources/2026-02-19-lyft-scaling-localization-with-ai —
canonical wiki instance. Lyft's AI localization pipeline
uses Pydantic schemas as the contract between the Drafter and
Evaluator agents. Documented shapes:
DrafterOutput,TranslationCandidate,EvaluatorOutput,CandidateEvaluation,Gradeenum,best_candidate_index: int. "This ensures type safety, reliable parsing, and clear contracts between Drafter and Evaluator."