PATTERN Cited by 1 source

Managed AI document-intelligence pipeline on AWS¶

A reusable AWS service-composition pattern for document-heavy enterprise workflows — contracts, claims, invoices, regulatory filings, vendor agreements — that converts unstructured documents into structured data with LLM-grounded extraction. Nine-service pipeline plus an analytical sink. Canonicalised via Doczy.ai's 22-month production deployment (sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws) processing 2.5 million contracts (50 million pages) at ~250 000 documents/week with 99% extraction accuracy.

Architecture¶

External user
       │
       ▼
[Amazon Cognito](<../systems/amazon-cognito.md>)      ← authentication, authorization
       │
       ▼
Frontend (e.g. Next.js)
       │  document upload
       ▼
[[Amazon S3]]            ← durable object storage
       │  S3 event
       ▼
[AWS Lambda](<../systems/aws-lambda.md>)           ← orchestrator
       │  invoke
       ▼
[Amazon Textract](<../systems/amazon-textract.md>)      ← OCR + metadata extraction
       │  raw text + structure
       ▼
┌─────────────────────────────────────┐
│  Custom processing layer            │
│  (typically [Amazon ECS](<../systems/amazon-ecs.md>))         │
│  - Chunking (e.g. smart chunking)   │
│  - Clustering (semantic / structural)│
│  - File-class routing               │
│  - Per-class prompt construction    │
└─────────────────────────────────────┘
       │  enriched prompt + grounded context
       ▼
[Amazon Bedrock](<../systems/amazon-bedrock.md>)       ← LLM, structured-output mode
       │  structured JSON
       ▼
[Snowflake](<../systems/snowflake.md>) (or other  ← centralised structured repository
analytical sink)
       │
       ▼
Intelligent dashboards
       │
       ▼
Downstream automation
  - Domain-specific systems (claims,
    payment processing, compliance)

Cross-cutting:
  [[Amazon CloudWatch]]    — real-time monitoring
  [AWS Secrets Manager](<../systems/aws-secrets-manager.md>)  — sensitive configuration

Three structural pieces¶

Ingestion + auth tier — Cognito + frontend + S3 + Lambda. Standard AWS "upload and trigger" shape; no AI yet. Lambda acts as the orchestrator that decides what runs next.
Processing + extraction tier — Textract for OCR/structure, then a custom processing layer (typically on ECS containers for the heavier / longer-running stages), then Bedrock for the LLM-grounded extraction step. The custom layer is where pipeline-specific intelligence lives:
chunking strategy (e.g. smart chunking)
clustering strategy (e.g. dual clustering)
file-class detection (concepts/file-class-routing)
per-class prompt template selection + few-shot/multi-shot example injection
Sink + serving tier — structured outputs flow into an analytical store (Snowflake, Redshift, or UC managed tables); dashboards and downstream automation systems consume from there.

Why the pieces are split this way¶

Lambda is right for the orchestrator — short-running, S3-event triggered, scales to zero.
Textract is right for OCR + structure extraction — managed per-page-priced; vastly more cost-effective than running OCR on self-managed compute.
ECS containers (rather than Lambda) for the chunking + clustering layer because that work is CPU/memory heavy and longer-running than Lambda's 15-minute ceiling and memory limits comfortably support, and because the embedding / pattern-recognition stack often has system-library dependencies awkward to package as Lambda layers.
Bedrock is right for the LLM — managed multi-model API, multi-region availability, no GPU procurement.
Snowflake (or analogous) for the analytical sink — multi-tenant analytical workloads on extracted structured data; decoupled from extraction-pipeline scaling.

Operational substrate¶

CloudWatch for real-time monitoring across all stages — Lambda invocations, Textract job completions, ECS task health, Bedrock latency / throttling, Snowflake ingest lag.
Secrets Manager for sensitive configuration — customer-specific API keys, Bedrock model identifiers, Snowflake credentials, third-party CLM-system integration credentials.

Outcomes from canonical instance¶

Doczy.ai over 22 months on this pattern:

2.5 M contract documents (50 M pages) processed
137 M Bedrock API calls
442 B tokens
~250 000 contracts/week peak production throughput
99% extraction accuracy (vs ~55% for the predecessor rules-based system on the same corpus)
97% reduction in manual processing time
~$330 M cumulative client savings (direct + indirect)

When to apply¶

Document-heavy enterprise workflows where structure carries meaning (contracts, regulatory filings, claims, vendor agreements, leases, insurance policies).
Production scale where the AWS-managed-service overhead is amortised across high document volume.
Pipelines that already have AWS as the primary cloud.
Customer requirements for enterprise features (tenant isolation, IAM, encryption-at-rest, regional residency, audit trails).

When not to apply¶

Low document volume — managed-service per-call pricing dominates fixed-cost self-hosted alternatives at small scale.
Pipelines targeting a different cloud (GCP / Azure) — the service-composition pattern is portable but the named services aren't.
Pure semantic-search use cases — RAG-pipeline patterns (retrieval + LLM, no extraction) are simpler and don't need Textract / chunking / clustering.
Multimodal-direct pipelines that feed images straight to a multimodal LLM (patterns/visual-first-document-extraction) — Textract + chunking is unnecessary in that pipeline shape.

Trade-offs¶

Trade-off	Discussion
Managed vs self-hosted	Managed wins on operations, fleet scaling, regional availability, SLA, IAM integration. Self-hosted wins on per-call cost at very high volume and on bespoke OCR / model choice.
Lambda vs ECS for processing	Lambda is right for orchestration + short tasks. ECS is right for chunking + clustering (longer-running, heavier compute, complex deps).
Bedrock model identity	Per-document cost vs accuracy trade-off; model class (Claude Opus / Sonnet / Haiku, Llama, Titan) is a per-pipeline choice driven by per-document budget × accuracy requirements.
Snowflake vs UC managed tables vs Redshift vs DuckDB	Sink choice is downstream of the pattern itself; pipeline doesn't constrain it.
Tenant isolation strength	IAM-based per-tenant isolation is the AWS default; PHI / regulated-data customers may require stronger boundaries (separate AWS accounts, encryption-key-per-tenant, network isolation).

Risks¶

Bedrock throttling / capacity exhaustion under traffic spikes — mitigation: PT + OD spillover from the Slack AI playbook.
Textract OCR quality on poor-scan PDFs — pipeline output quality is bounded by OCR quality on the input side.
LLM hallucination on out-of-distribution documents — the per-class prompt-routing is a partial mitigation; broader hallucination-mitigation patterns ( output-validation gates, structured-output schemas, LLM-judge evaluation) compose with this pattern.
Cost growth with token volume — at 442 B tokens / 22 months, Bedrock spend is the dominant line item. Cost-efficiency optimisations like prompt caching, model selection, batching, and partial extraction (only extract changed sections of amended documents) compose with this pattern.

Composition with other patterns¶

Inside the processing layer: concepts/smart-chunking, concepts/dual-clustering-document-intelligence, concepts/file-class-routing, concepts/prompt-optimization-feedback-loop, patterns/two-pass-classify-then-deep-extract.
Sibling pipeline shape: patterns/visual-first-document-extraction — alternative approach that skips Textract and feeds document images directly to a multimodal LLM.
Capacity tier: patterns/provisioned-throughput-with-on-demand-spillover for the Bedrock calls.
Quality gate: patterns/llm-judge-as-inline-pipeline-stage for output validation at the LLM-extraction step.
Data sink shape: patterns/managed-table-as-default-storage-layer (UC managed tables) or patterns/clickhouse-plus-snowflake-dual-storage-tier (mixed hot/cold storage tier) on the downstream side.

Seen in¶

sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws — canonical wiki disclosure via systems/doczy-ai's contract-intelligence production pipeline; full nine-service composition with Snowflake sink + the three load-bearing algorithmic primitives (concepts/smart-chunking + concepts/dual-clustering-document-intelligence + concepts/prompt-optimization-feedback-loop) inside the processing layer.