Skip to content

PATTERN Cited by 1 source

Managed AI document-intelligence pipeline on AWS

A reusable AWS service-composition pattern for document-heavy enterprise workflows — contracts, claims, invoices, regulatory filings, vendor agreements — that converts unstructured documents into structured data with LLM-grounded extraction. Nine-service pipeline plus an analytical sink. Canonicalised via Doczy.ai's 22-month production deployment (sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-aws) processing 2.5 million contracts (50 million pages) at ~250 000 documents/week with 99% extraction accuracy.

Architecture

External user
[Amazon Cognito](<../systems/amazon-cognito.md>)      ← authentication, authorization
Frontend (e.g. Next.js)
       │  document upload
[[Amazon S3]]            ← durable object storage
       │  S3 event
[AWS Lambda](<../systems/aws-lambda.md>)           ← orchestrator
       │  invoke
[Amazon Textract](<../systems/amazon-textract.md>)      ← OCR + metadata extraction
       │  raw text + structure
┌─────────────────────────────────────┐
│  Custom processing layer            │
│  (typically [Amazon ECS](<../systems/amazon-ecs.md>))         │
│  - Chunking (e.g. smart chunking)   │
│  - Clustering (semantic / structural)│
│  - File-class routing               │
│  - Per-class prompt construction    │
└─────────────────────────────────────┘
       │  enriched prompt + grounded context
[Amazon Bedrock](<../systems/amazon-bedrock.md>)       ← LLM, structured-output mode
       │  structured JSON
[Snowflake](<../systems/snowflake.md>) (or other  ← centralised structured repository
analytical sink)
Intelligent dashboards
Downstream automation
  - Domain-specific systems (claims,
    payment processing, compliance)

Cross-cutting:
  [[Amazon CloudWatch]]    — real-time monitoring
  [AWS Secrets Manager](<../systems/aws-secrets-manager.md>)  — sensitive configuration

Three structural pieces

  1. Ingestion + auth tier — Cognito + frontend + S3 + Lambda. Standard AWS "upload and trigger" shape; no AI yet. Lambda acts as the orchestrator that decides what runs next.
  2. Processing + extraction tier — Textract for OCR/structure, then a custom processing layer (typically on ECS containers for the heavier / longer-running stages), then Bedrock for the LLM-grounded extraction step. The custom layer is where pipeline-specific intelligence lives:
  3. chunking strategy (e.g. smart chunking)
  4. clustering strategy (e.g. dual clustering)
  5. file-class detection (concepts/file-class-routing)
  6. per-class prompt template selection + few-shot/multi-shot example injection
  7. Sink + serving tier — structured outputs flow into an analytical store (Snowflake, Redshift, or UC managed tables); dashboards and downstream automation systems consume from there.

Why the pieces are split this way

  • Lambda is right for the orchestrator — short-running, S3-event triggered, scales to zero.
  • Textract is right for OCR + structure extraction — managed per-page-priced; vastly more cost-effective than running OCR on self-managed compute.
  • ECS containers (rather than Lambda) for the chunking + clustering layer because that work is CPU/memory heavy and longer-running than Lambda's 15-minute ceiling and memory limits comfortably support, and because the embedding / pattern-recognition stack often has system-library dependencies awkward to package as Lambda layers.
  • Bedrock is right for the LLM — managed multi-model API, multi-region availability, no GPU procurement.
  • Snowflake (or analogous) for the analytical sink — multi-tenant analytical workloads on extracted structured data; decoupled from extraction-pipeline scaling.

Operational substrate

  • CloudWatch for real-time monitoring across all stages — Lambda invocations, Textract job completions, ECS task health, Bedrock latency / throttling, Snowflake ingest lag.
  • Secrets Manager for sensitive configuration — customer-specific API keys, Bedrock model identifiers, Snowflake credentials, third-party CLM-system integration credentials.

Outcomes from canonical instance

Doczy.ai over 22 months on this pattern:

  • 2.5 M contract documents (50 M pages) processed
  • 137 M Bedrock API calls
  • 442 B tokens
  • ~250 000 contracts/week peak production throughput
  • 99% extraction accuracy (vs ~55% for the predecessor rules-based system on the same corpus)
  • 97% reduction in manual processing time
  • ~$330 M cumulative client savings (direct + indirect)

When to apply

  • Document-heavy enterprise workflows where structure carries meaning (contracts, regulatory filings, claims, vendor agreements, leases, insurance policies).
  • Production scale where the AWS-managed-service overhead is amortised across high document volume.
  • Pipelines that already have AWS as the primary cloud.
  • Customer requirements for enterprise features (tenant isolation, IAM, encryption-at-rest, regional residency, audit trails).

When not to apply

  • Low document volume — managed-service per-call pricing dominates fixed-cost self-hosted alternatives at small scale.
  • Pipelines targeting a different cloud (GCP / Azure) — the service-composition pattern is portable but the named services aren't.
  • Pure semantic-search use cases — RAG-pipeline patterns (retrieval + LLM, no extraction) are simpler and don't need Textract / chunking / clustering.
  • Multimodal-direct pipelines that feed images straight to a multimodal LLM (patterns/visual-first-document-extraction) — Textract + chunking is unnecessary in that pipeline shape.

Trade-offs

Trade-off Discussion
Managed vs self-hosted Managed wins on operations, fleet scaling, regional availability, SLA, IAM integration. Self-hosted wins on per-call cost at very high volume and on bespoke OCR / model choice.
Lambda vs ECS for processing Lambda is right for orchestration + short tasks. ECS is right for chunking + clustering (longer-running, heavier compute, complex deps).
Bedrock model identity Per-document cost vs accuracy trade-off; model class (Claude Opus / Sonnet / Haiku, Llama, Titan) is a per-pipeline choice driven by per-document budget × accuracy requirements.
Snowflake vs UC managed tables vs Redshift vs DuckDB Sink choice is downstream of the pattern itself; pipeline doesn't constrain it.
Tenant isolation strength IAM-based per-tenant isolation is the AWS default; PHI / regulated-data customers may require stronger boundaries (separate AWS accounts, encryption-key-per-tenant, network isolation).

Risks

  • Bedrock throttling / capacity exhaustion under traffic spikes — mitigation: PT + OD spillover from the Slack AI playbook.
  • Textract OCR quality on poor-scan PDFs — pipeline output quality is bounded by OCR quality on the input side.
  • LLM hallucination on out-of-distribution documents — the per-class prompt-routing is a partial mitigation; broader hallucination-mitigation patterns ( output-validation gates, structured-output schemas, LLM-judge evaluation) compose with this pattern.
  • Cost growth with token volume — at 442 B tokens / 22 months, Bedrock spend is the dominant line item. Cost-efficiency optimisations like prompt caching, model selection, batching, and partial extraction (only extract changed sections of amended documents) compose with this pattern.

Composition with other patterns

Seen in

Last updated · 542 distilled / 1,571 read