Skip to content

CONCEPT Cited by 1 source

File-class routing

File-class routing is a two-pass document-extraction discipline in which the first pass classifies a document's file class (contract type, schedule type, exhibit type, invoice format, etc.) and the second pass dispatches to a per-class prompt / extractor / output schema rather than running a single one-size-fits-all extraction.

Canonicalised on the wiki via AArete's Doczy.ai disclosure (sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws):

"Before output is finalized, the system determines each document's file class and generates prompts tailored to the extracted text, cluster classification, and domain context."

Why route by class

Contract-class documents have structurally different fields per type:

  • A provider arrangement has reimbursement rates, term duration, in-network/out-of-network designations.
  • A vendor invoice has line items, billing periods, payment terms.
  • A legal agreement has indemnification clauses, dispute resolution, governing law.

A single prompt that tries to extract all possible fields from any document does badly on all three — too many irrelevant fields, too little space for the field set that actually matters for the document at hand. Per-class prompts give each extractor a focused field set matched to the document's actual schema.

Sibling concepts on the wiki

Concept Selector Dispatch target Source
File-class routing Document type Per-class prompt / extractor Doczy.ai (2026-06-02)
concepts/llm-workflow-router Query intent Per-workflow handler Yelp CS Chatbot (2026-05-27)
patterns/two-pass-classify-then-deep-extract Classification first Type-specific deep extraction Databricks Unlocking the Archives (2026-05-11)

The selector axis differs (document-type vs query-intent vs artefact-type), but the structural shape is the same — classify first, dispatch second.

Composition with prompt optimisation feedback loop

In Doczy.ai, file-class routing is the substrate that makes the prompt-optimisation feedback loop tractable: per-class prompts can independently accumulate examples, drift independently, and be evaluated independently. Without per-class routing, a single global prompt would have to absorb all classes' worth of examples, exceeding token budgets and averaging accuracy across very different extraction targets.

Required substrate

  • File-class detection mechanism. Could be a separate small classifier model, a rules-based heuristic over OCR output, an LLM-based zero-shot classification step, or a combination. The Doczy.ai disclosure doesn't specify the mechanism.
  • Per-class prompt template library. One template per known file class.
  • Dispatch logic. Given the detected class, select the matching template.
  • Fallback. Behaviour when an out-of-distribution document arrives (no matching class detected).

When to apply

  • Pipelines processing multiple distinct document types where per-type fields differ significantly.
  • Extraction targets where per-class prompt tuning yields meaningful accuracy gain over one-size-fits-all.
  • Production scale where building and maintaining a class-template library is justified.

When not to apply

  • Single-document-class pipelines.
  • Pipelines where document classes are too fine-grained or too fast-evolving to maintain templates for.
  • Pipelines where documents are inherently mixed-class (a single file may contain multiple classes; a flat prompt that handles all classes simultaneously may work better).

Risks

  • Misclassification cascades. A wrong class detection at step 1 routes to the wrong extractor at step 2; the second-pass output may look syntactically valid but be semantically wrong.
  • Long-tail class drift. Rare classes accumulate fewer examples and may underperform.
  • Template-library maintenance burden. As the document population evolves, templates need updates.

Seen in

Last updated · 542 distilled / 1,571 read