CONCEPT Cited by 1 source
File-class routing¶
File-class routing is a two-pass document-extraction discipline in which the first pass classifies a document's file class (contract type, schedule type, exhibit type, invoice format, etc.) and the second pass dispatches to a per-class prompt / extractor / output schema rather than running a single one-size-fits-all extraction.
Canonicalised on the wiki via AArete's Doczy.ai disclosure (sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws):
"Before output is finalized, the system determines each document's file class and generates prompts tailored to the extracted text, cluster classification, and domain context."
Why route by class¶
Contract-class documents have structurally different fields per type:
- A provider arrangement has reimbursement rates, term duration, in-network/out-of-network designations.
- A vendor invoice has line items, billing periods, payment terms.
- A legal agreement has indemnification clauses, dispute resolution, governing law.
A single prompt that tries to extract all possible fields from any document does badly on all three — too many irrelevant fields, too little space for the field set that actually matters for the document at hand. Per-class prompts give each extractor a focused field set matched to the document's actual schema.
Sibling concepts on the wiki¶
| Concept | Selector | Dispatch target | Source |
|---|---|---|---|
| File-class routing | Document type | Per-class prompt / extractor | Doczy.ai (2026-06-02) |
| concepts/llm-workflow-router | Query intent | Per-workflow handler | Yelp CS Chatbot (2026-05-27) |
| patterns/two-pass-classify-then-deep-extract | Classification first | Type-specific deep extraction | Databricks Unlocking the Archives (2026-05-11) |
The selector axis differs (document-type vs query-intent vs artefact-type), but the structural shape is the same — classify first, dispatch second.
Composition with prompt optimisation feedback loop¶
In Doczy.ai, file-class routing is the substrate that makes the prompt-optimisation feedback loop tractable: per-class prompts can independently accumulate examples, drift independently, and be evaluated independently. Without per-class routing, a single global prompt would have to absorb all classes' worth of examples, exceeding token budgets and averaging accuracy across very different extraction targets.
Required substrate¶
- File-class detection mechanism. Could be a separate small classifier model, a rules-based heuristic over OCR output, an LLM-based zero-shot classification step, or a combination. The Doczy.ai disclosure doesn't specify the mechanism.
- Per-class prompt template library. One template per known file class.
- Dispatch logic. Given the detected class, select the matching template.
- Fallback. Behaviour when an out-of-distribution document arrives (no matching class detected).
When to apply¶
- Pipelines processing multiple distinct document types where per-type fields differ significantly.
- Extraction targets where per-class prompt tuning yields meaningful accuracy gain over one-size-fits-all.
- Production scale where building and maintaining a class-template library is justified.
When not to apply¶
- Single-document-class pipelines.
- Pipelines where document classes are too fine-grained or too fast-evolving to maintain templates for.
- Pipelines where documents are inherently mixed-class (a single file may contain multiple classes; a flat prompt that handles all classes simultaneously may work better).
Risks¶
- Misclassification cascades. A wrong class detection at step 1 routes to the wrong extractor at step 2; the second-pass output may look syntactically valid but be semantically wrong.
- Long-tail class drift. Rare classes accumulate fewer examples and may underperform.
- Template-library maintenance burden. As the document population evolves, templates need updates.
Seen in¶
- sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws — canonical wiki disclosure as part of Doczy.ai's per-document prompt-construction step. "the system determines each document's file class and generates prompts tailored to the extracted text, cluster classification, and domain context."
Related¶
- concepts/prompt-optimization-feedback-loop — downstream consumer of the per-class prompt structure
- concepts/llm-workflow-router — sibling at query-intent altitude
- patterns/two-pass-classify-then-deep-extract — canonical classify-then-extract pattern
- patterns/multi-step-llm-extraction-pipeline — broader multi-step pipeline pattern
- systems/doczy-ai