CONCEPT Cited by 1 source

File-class routing¶

File-class routing is a two-pass document-extraction discipline in which the first pass classifies a document's file class (contract type, schedule type, exhibit type, invoice format, etc.) and the second pass dispatches to a per-class prompt / extractor / output schema rather than running a single one-size-fits-all extraction.

Canonicalised on the wiki via AArete's Doczy.ai disclosure (sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws):

"Before output is finalized, the system determines each document's file class and generates prompts tailored to the extracted text, cluster classification, and domain context."

Why route by class¶

Contract-class documents have structurally different fields per type:

A provider arrangement has reimbursement rates, term duration, in-network/out-of-network designations.
A vendor invoice has line items, billing periods, payment terms.
A legal agreement has indemnification clauses, dispute resolution, governing law.

A single prompt that tries to extract all possible fields from any document does badly on all three — too many irrelevant fields, too little space for the field set that actually matters for the document at hand. Per-class prompts give each extractor a focused field set matched to the document's actual schema.

Sibling concepts on the wiki¶

Concept	Selector	Dispatch target	Source
File-class routing	Document type	Per-class prompt / extractor	Doczy.ai (2026-06-02)
concepts/llm-workflow-router	Query intent	Per-workflow handler	Yelp CS Chatbot (2026-05-27)
patterns/two-pass-classify-then-deep-extract	Classification first	Type-specific deep extraction	Databricks Unlocking the Archives (2026-05-11)

The selector axis differs (document-type vs query-intent vs artefact-type), but the structural shape is the same — classify first, dispatch second.

Composition with prompt optimisation feedback loop¶

In Doczy.ai, file-class routing is the substrate that makes the prompt-optimisation feedback loop tractable: per-class prompts can independently accumulate examples, drift independently, and be evaluated independently. Without per-class routing, a single global prompt would have to absorb all classes' worth of examples, exceeding token budgets and averaging accuracy across very different extraction targets.

Required substrate¶

File-class detection mechanism. Could be a separate small classifier model, a rules-based heuristic over OCR output, an LLM-based zero-shot classification step, or a combination. The Doczy.ai disclosure doesn't specify the mechanism.
Per-class prompt template library. One template per known file class.
Dispatch logic. Given the detected class, select the matching template.
Fallback. Behaviour when an out-of-distribution document arrives (no matching class detected).

When to apply¶

Pipelines processing multiple distinct document types where per-type fields differ significantly.
Extraction targets where per-class prompt tuning yields meaningful accuracy gain over one-size-fits-all.
Production scale where building and maintaining a class-template library is justified.

When not to apply¶

Single-document-class pipelines.
Pipelines where document classes are too fine-grained or too fast-evolving to maintain templates for.
Pipelines where documents are inherently mixed-class (a single file may contain multiple classes; a flat prompt that handles all classes simultaneously may work better).

Risks¶

Misclassification cascades. A wrong class detection at step 1 routes to the wrong extractor at step 2; the second-pass output may look syntactically valid but be semantically wrong.
Long-tail class drift. Rare classes accumulate fewer examples and may underperform.
Template-library maintenance burden. As the document population evolves, templates need updates.

Seen in¶

sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws — canonical wiki disclosure as part of Doczy.ai's per-document prompt-construction step. "the system determines each document's file class and generates prompts tailored to the extracted text, cluster classification, and domain context."

concepts/prompt-optimization-feedback-loop — downstream consumer of the per-class prompt structure
concepts/llm-workflow-router — sibling at query-intent altitude
patterns/two-pass-classify-then-deep-extract — canonical classify-then-extract pattern
patterns/multi-step-llm-extraction-pipeline — broader multi-step pipeline pattern
systems/doczy-ai