PATTERN Cited by 1 source
Managed AI document-intelligence pipeline on AWS¶
A reusable AWS service-composition pattern for document-heavy enterprise workflows — contracts, claims, invoices, regulatory filings, vendor agreements — that converts unstructured documents into structured data with LLM-grounded extraction. Nine-service pipeline plus an analytical sink. Canonicalised via Doczy.ai's 22-month production deployment (sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws) processing 2.5 million contracts (50 million pages) at ~250 000 documents/week with 99% extraction accuracy.
Architecture¶
External user
│
▼
[Amazon Cognito](<../systems/amazon-cognito.md>) ← authentication, authorization
│
▼
Frontend (e.g. Next.js)
│ document upload
▼
[[Amazon S3]] ← durable object storage
│ S3 event
▼
[AWS Lambda](<../systems/aws-lambda.md>) ← orchestrator
│ invoke
▼
[Amazon Textract](<../systems/amazon-textract.md>) ← OCR + metadata extraction
│ raw text + structure
▼
┌─────────────────────────────────────┐
│ Custom processing layer │
│ (typically [Amazon ECS](<../systems/amazon-ecs.md>)) │
│ - Chunking (e.g. smart chunking) │
│ - Clustering (semantic / structural)│
│ - File-class routing │
│ - Per-class prompt construction │
└─────────────────────────────────────┘
│ enriched prompt + grounded context
▼
[Amazon Bedrock](<../systems/amazon-bedrock.md>) ← LLM, structured-output mode
│ structured JSON
▼
[Snowflake](<../systems/snowflake.md>) (or other ← centralised structured repository
analytical sink)
│
▼
Intelligent dashboards
│
▼
Downstream automation
- Domain-specific systems (claims,
payment processing, compliance)
Cross-cutting:
[[Amazon CloudWatch]] — real-time monitoring
[AWS Secrets Manager](<../systems/aws-secrets-manager.md>) — sensitive configuration
Three structural pieces¶
- Ingestion + auth tier — Cognito + frontend + S3 + Lambda. Standard AWS "upload and trigger" shape; no AI yet. Lambda acts as the orchestrator that decides what runs next.
- Processing + extraction tier — Textract for OCR/structure, then a custom processing layer (typically on ECS containers for the heavier / longer-running stages), then Bedrock for the LLM-grounded extraction step. The custom layer is where pipeline-specific intelligence lives:
- chunking strategy (e.g. smart chunking)
- clustering strategy (e.g. dual clustering)
- file-class detection (concepts/file-class-routing)
- per-class prompt template selection + few-shot/multi-shot example injection
- Sink + serving tier — structured outputs flow into an analytical store (Snowflake, Redshift, or UC managed tables); dashboards and downstream automation systems consume from there.
Why the pieces are split this way¶
- Lambda is right for the orchestrator — short-running, S3-event triggered, scales to zero.
- Textract is right for OCR + structure extraction — managed per-page-priced; vastly more cost-effective than running OCR on self-managed compute.
- ECS containers (rather than Lambda) for the chunking + clustering layer because that work is CPU/memory heavy and longer-running than Lambda's 15-minute ceiling and memory limits comfortably support, and because the embedding / pattern-recognition stack often has system-library dependencies awkward to package as Lambda layers.
- Bedrock is right for the LLM — managed multi-model API, multi-region availability, no GPU procurement.
- Snowflake (or analogous) for the analytical sink — multi-tenant analytical workloads on extracted structured data; decoupled from extraction-pipeline scaling.
Operational substrate¶
- CloudWatch for real-time monitoring across all stages — Lambda invocations, Textract job completions, ECS task health, Bedrock latency / throttling, Snowflake ingest lag.
- Secrets Manager for sensitive configuration — customer-specific API keys, Bedrock model identifiers, Snowflake credentials, third-party CLM-system integration credentials.
Outcomes from canonical instance¶
Doczy.ai over 22 months on this pattern:
- 2.5 M contract documents (50 M pages) processed
- 137 M Bedrock API calls
- 442 B tokens
- ~250 000 contracts/week peak production throughput
- 99% extraction accuracy (vs ~55% for the predecessor rules-based system on the same corpus)
- 97% reduction in manual processing time
- ~$330 M cumulative client savings (direct + indirect)
When to apply¶
- Document-heavy enterprise workflows where structure carries meaning (contracts, regulatory filings, claims, vendor agreements, leases, insurance policies).
- Production scale where the AWS-managed-service overhead is amortised across high document volume.
- Pipelines that already have AWS as the primary cloud.
- Customer requirements for enterprise features (tenant isolation, IAM, encryption-at-rest, regional residency, audit trails).
When not to apply¶
- Low document volume — managed-service per-call pricing dominates fixed-cost self-hosted alternatives at small scale.
- Pipelines targeting a different cloud (GCP / Azure) — the service-composition pattern is portable but the named services aren't.
- Pure semantic-search use cases — RAG-pipeline patterns (retrieval + LLM, no extraction) are simpler and don't need Textract / chunking / clustering.
- Multimodal-direct pipelines that feed images straight to a multimodal LLM (patterns/visual-first-document-extraction) — Textract + chunking is unnecessary in that pipeline shape.
Trade-offs¶
| Trade-off | Discussion |
|---|---|
| Managed vs self-hosted | Managed wins on operations, fleet scaling, regional availability, SLA, IAM integration. Self-hosted wins on per-call cost at very high volume and on bespoke OCR / model choice. |
| Lambda vs ECS for processing | Lambda is right for orchestration + short tasks. ECS is right for chunking + clustering (longer-running, heavier compute, complex deps). |
| Bedrock model identity | Per-document cost vs accuracy trade-off; model class (Claude Opus / Sonnet / Haiku, Llama, Titan) is a per-pipeline choice driven by per-document budget × accuracy requirements. |
| Snowflake vs UC managed tables vs Redshift vs DuckDB | Sink choice is downstream of the pattern itself; pipeline doesn't constrain it. |
| Tenant isolation strength | IAM-based per-tenant isolation is the AWS default; PHI / regulated-data customers may require stronger boundaries (separate AWS accounts, encryption-key-per-tenant, network isolation). |
Risks¶
- Bedrock throttling / capacity exhaustion under traffic spikes — mitigation: PT + OD spillover from the Slack AI playbook.
- Textract OCR quality on poor-scan PDFs — pipeline output quality is bounded by OCR quality on the input side.
- LLM hallucination on out-of-distribution documents — the per-class prompt-routing is a partial mitigation; broader hallucination-mitigation patterns ( output-validation gates, structured-output schemas, LLM-judge evaluation) compose with this pattern.
- Cost growth with token volume — at 442 B tokens / 22 months, Bedrock spend is the dominant line item. Cost-efficiency optimisations like prompt caching, model selection, batching, and partial extraction (only extract changed sections of amended documents) compose with this pattern.
Composition with other patterns¶
- Inside the processing layer: concepts/smart-chunking, concepts/dual-clustering-document-intelligence, concepts/file-class-routing, concepts/prompt-optimization-feedback-loop, patterns/two-pass-classify-then-deep-extract.
- Sibling pipeline shape: patterns/visual-first-document-extraction — alternative approach that skips Textract and feeds document images directly to a multimodal LLM.
- Capacity tier: patterns/provisioned-throughput-with-on-demand-spillover for the Bedrock calls.
- Quality gate: patterns/llm-judge-as-inline-pipeline-stage for output validation at the LLM-extraction step.
- Data sink shape: patterns/managed-table-as-default-storage-layer (UC managed tables) or patterns/clickhouse-plus-snowflake-dual-storage-tier (mixed hot/cold storage tier) on the downstream side.
Seen in¶
- sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws — canonical wiki disclosure via systems/doczy-ai's contract-intelligence production pipeline; full nine-service composition with Snowflake sink + the three load-bearing algorithmic primitives (concepts/smart-chunking + concepts/dual-clustering-document-intelligence + concepts/prompt-optimization-feedback-loop) inside the processing layer.
Related¶
- systems/doczy-ai
- concepts/smart-chunking
- concepts/dual-clustering-document-intelligence
- concepts/prompt-optimization-feedback-loop
- concepts/file-class-routing
- patterns/two-pass-classify-then-deep-extract
- patterns/multi-step-llm-extraction-pipeline
- patterns/visual-first-document-extraction
- systems/amazon-bedrock
- systems/amazon-textract
- systems/snowflake