Skip to content

AWS Tier 1

Read original ↗

AWS — Automating contract intelligence with Doczy.ai on AWS

Summary

AWS Architecture Blog post (2026-06-02) co-authored with AArete — a global management and technology consulting firm specialising in healthcare — describing the AWS architecture and proprietary algorithms behind Doczy.ai, AArete's intelligent contract interpretation SaaS solution. The post is a partner-solution architecture post rather than a deep architectural retrospective — depth is moderate, but the operational numbers are concrete and the named AWS service-composition pattern is reusable. Passes scope as a borderline-include on production operational numbers (2.5 M contract documents / 50 M pages / 137 M Amazon Bedrock API calls / 442 billion tokens processed over 22 months, ~250 000 contracts/week production throughput, ~$330 M cumulative client savings) and on the dual-clustering + smart-chunking algorithmic shape that is named at architecture-shape altitude (mechanism internals deferred to AArete's patent and SaaS implementation).

The architecture is a managed AI document-intelligence pipeline on AWSCognito for authn → Next.js frontend → Amazon S3 for document upload → AWS Lambda orchestration → Amazon Textract for OCR and metadata extraction → AArete's proprietary smart chunking + dual clustering pre-processing → Amazon Bedrock LLM with domain-tuned promptsSnowflake structured repository → intelligent dashboards. Operational substrate: Amazon ECS for container compute, Amazon CloudWatch for monitoring, AWS Secrets Manager for sensitive configuration. Canonicalised as patterns/managed-ai-document-intelligence-pipeline-on-aws.

The article's central architectural argument is a three-era accuracy progression verbatim: "Prior to 2020, document processing required manual effort, with individuals processing approximately 100 documents per week. Between 2020–2023, the firm implemented rules-based contract processing, achieving approximately 55% accuracy. The breakthrough came in 2024 with an AI-based processing built on AWS achieved 99% accuracy — a dramatic improvement over the 55% accuracy of traditional rules-based systems." The 55%-to-99% jump is the structural payoff of replacing rules with LLM-grounded extraction, with the smart chunking + dual clustering as the grounding substrate that lets the LLM see the document's structure rather than a flat token sequence.

Key takeaways

  1. Three-era accuracy curve: 100 docs/week manual → ~55% rules-based (2020–2023) → 99% AI-based (2024+). The article frames this as the canonical progression for document-extraction problems — manual is too slow at any meaningful volume, rules-based hits a ~55% ceiling because contract terms are "nuanced and complex" and don't fit predefined fields, and LLM-based extraction breaks the ceiling because the model can reason about context. (Source: "AI-based processing built on AWS achieved 99% accuracy — a dramatic improvement over the 55% accuracy of traditional rules-based systems.") The wiki's first canonical disclosure of this rules-vs-LLM document-extraction accuracy step-function with concrete percentages and a named production system on the high side.

  2. 22-month production scale: 2.5 M documents, 50 M pages, 137 M Bedrock API calls, 442 B tokens, ~$330 M cumulative client savings, 250 K contracts/week peak throughput. This is the operational table — "This solution has processed 2.5 million contract documents (50 million pages) with 137 million API calls to Amazon Bedrock and 442 billion tokens — a level of automation and accuracy previously unattainable through manual or traditional document processing approaches. Over this same period, Doczy.ai™ has helped clients achieve approximately 330 million dollars in cumulative direct and indirect savings." And "processing up to 250,000 contract documents per week with 99% accuracy." These are the load-bearing numbers that take this from partner-PR territory into a real production-system disclosure. Tokens-per-call ratio: 442 B / 137 M ≈ 3 200 tokens/call average — consistent with multi-page contract chunks plus prompt overhead. Pages-per-document: 50 M / 2.5 M = 20 pages/document average.

  3. 97% reduction in manual processing time, freeing humans for strategic / judgment work. Verbatim: "The 97% reduction in manual processing time translates directly to cost savings and enables organizations to reallocate human resources to higher-value activities that require judgment and strategic thinking." Sibling framing to the rest of the wiki's KTLO / first-pass-investigator literature — the agent does the pattern-matched bulk; humans review and act on the high-value cases.

  4. **Architecture is a 9-AWS-service composition: Cognito + S3 + Lambda + Textract + Bedrock + ECS + CloudWatch + Secrets Manager

  5. (Snowflake as analytical sink).** The pattern is reusable — any document-heavy enterprise workflow (contracts, invoices, claims, regulatory filings, vendor agreements) maps onto this shape. Canonicalised as patterns/managed-ai-document-intelligence-pipeline-on-aws with full ASCII flow diagram. The Lambda → Textract → custom processing → Bedrock pipeline is the load-bearing inner loop; the surrounding services provide auth, storage, container compute for the smart-chunking and dual-clustering engines, monitoring, and secrets handling.

  6. "Smart chunking" preserves document hierarchy and one-to-many relationships rather than treating documents as flat token sequences. Verbatim: "smart chunking, a proprietary approach that goes far beyond pulling words off a page. Rather than treating a document as a flat sequence of text, smart chunking preserves hierarchical structure and one-to-many relationships within documents. It uses a combination of semantic and keyword search to decompose text into meaningful, context-aware chunks, applying dynamic parameters to maintain logical relationships throughout. Sequential identifiers and metadata-driven grouping organize these chunks into field groups, detecting overlaps and removing duplications while keeping the document's natural flow intact." This is the document-structure-preserving flavour of chunking — sibling to but distinct from metadata-only embedding (Yelp 2026-05-27) which embeds title/summary/headers separately, or chunk-too-large dilution avoidance via paragraph splits. Smart chunking sits at the input-side preprocessing altitude — what the model gets to see, structured to retain hierarchy. Mechanism details (algorithm, dynamic-parameter tuning, sequential-identifier scheme) are AArete's IP and not disclosed.

  7. "Dual clustering" engine fuses semantic and structural views of every contract for grounded extraction. Verbatim: "the dual clustering engine of Doczy.ai. This two-lens methodology analyzes every contract simultaneously from both a semantic and a structural perspective. On the semantic side, extracted text is converted into embeddings, numerical representations of meaning, and similar ideas are grouped together even when they're expressed in different words. On the structural side, pattern-recognition algorithms identify clause types, formatting conventions, table layouts, and hierarchical organization, understanding. For example, that a three-nested- level exhibit carries fundamentally different implications than a straightforward attached schedule. These two analyses don't operate in isolation. Projection algorithms compare the semantic and structural clusters side by side, synthesizing them into a unified, enriched document model that captures both meaning and context. It's this convergence that drives the 99% accuracy rate of Doczy.ai. The system doesn't just read the words, it understands the contract." Canonicalised as concepts/dual-clustering-document-intelligence. Sibling to multi-source fusion (Netflix Service Topology, 2026-05-29) at a different altitude — same architectural shape (multiple complementary views fused to overcome any one's limits), applied here to document understanding rather than service-topology mapping.

  8. Domain-tuned prompts via few-shot + multi-shot feedback loop — accuracy compounds over time. Verbatim: "Before output is finalized, the system determines each document's file class and generates prompts tailored to the extracted text, cluster classification, and domain context. Through few-shot and multi-shot prompting, the platform continuously edits the prompt on domain-specific examples and based on real outputs, creating a feedback loop that compounds accuracy improvements over time." Two structural primitives canonicalised: (a) file-class routing (concepts/file-class-routing) — document type detected before prompt selection, sibling to the classify-then-extract pattern in patterns/two-pass-classify-then-deep-extract; (b) prompt-optimisation feedback loop (concepts/prompt-optimization-feedback-loop) — production outputs feed back into prompt edits, sibling to but distinct from the agent self-correction loop (Genie / Databricks data agents) and from GEPA-style automated prompt optimisers.

  9. Output sink: structured data into Snowflake → intelligent dashboards. Verbatim: "The resulting structured data flows into Snowflake, forming a centralized repository that powers intelligent dashboards with actionable insights and visualizations." Canonical AWS-side ingestion + Snowflake-side serving split — sibling to the wiki's existing Snowflake usage patterns (Canva creator-payments aggregation, Instacart Pixel metadata, Redpanda Iceberg-topics + Snowflake streaming). The downstream consumers feed two specific health-plan automation capabilities: claims-system configuration (auto-translate reimbursement terms into claims-system rules) and vendor invoice verification (assess payments against contract terms, flag overpayments / underpayments before they occur).

  10. Operational substrate: ECS for compute, CloudWatch for monitoring, Secrets Manager for sensitive config. Verbatim: "Throughout the entire pipeline, Amazon CloudWatch monitors performance in real time and proactively surfaces issues before they escalate, while AWS Secrets Manager safeguards sensitive information, ensuring that security is not an afterthought, but a foundational layer woven into every stage of the system." The ECS attribution is from the conclusion's service list ("Amazon Cognito, Amazon S3, AWS Lambda, Amazon Textract, Amazon Elastic Container Service (Amazon ECS), Amazon Bedrock, Amazon CloudWatch, and AWS Secrets Manager") — likely hosts the smart-chunking and dual-clustering engines, though the article doesn't attribute specific stages to ECS vs Lambda. The Lambda mentioned in the body is the orchestrator that triggers Textract; longer-running and stateful processing plausibly runs on ECS containers.

  11. Pre-2020 manual baseline: 100 documents/week per human; 2024+ AI baseline: 250 000 documents/week per system; 2 500× headline throughput improvement per processing unit. This is the headline economic argument — even allowing for orders of magnitude difference between "per human" and "per system" that conflate manpower and machine capacity, the volume gap is structurally what makes the "2.5 M documents in 22 months" operational scale viable. Manual scale is bounded by labour economics; AI-based scale is bounded by AWS capacity allocation and Bedrock throughput envelope. (Source: "individuals processing approximately 100 documents per week" and "processing up to 250,000 contract documents per week".)

  12. Healthcare claims systems as canonical downstream consumer. Verbatim: "For health plans, Doczy.ai™ provides a powerful solution to automate and improve contract management across the entire lifecycle. It ingests existing contracts in both paper and digital formats, integrates with contract management systems such as Coupa and Icertis, and processes new contracts and amendments as they're executed. It then creates a centralized metadata repository that feeds directly into downstream systems, enabling end-to-end business process automation." The named CLM (Contract Lifecycle Management) integrations — Coupa, Icertis — establish that Doczy.ai isn't trying to replace existing CLM systems but rather sits upstream of them as an extraction substrate, similar to how Stripe Radar exposes signals as APIs rather than replacing payment processors.

  13. SaaS delivery model — no infrastructure procurement, AArete runs the AWS architecture for the customer. Verbatim: "AArete offers the platform as a Software as a Service (SaaS) solution, enabling rapid deployment without significant infrastructure investment. AArete's team of experts will configure this solution for your specific document types, domain terminology, and business processes." Configuration burden = AArete-side; customer-side burden = identifying the document classes and domain terminology. Sibling to the wiki's configuration-driven tenant onboarding pattern (AWS hybrid multi-tenant architectures).

Architecture and operational numbers

Datum Value
Production scale 2.5 M contract documents over 22 months
Page volume 50 M pages
Bedrock API calls 137 M
Token volume 442 B tokens
Avg tokens/call ~3 200
Avg pages/document 20
Peak weekly throughput ~250 000 contracts/week
Cumulative client savings ~$330 M (direct + indirect)
AI-based accuracy 99%
Rules-based accuracy ceiling ~55% (2020–2023 baseline)
Manual accuracy / throughput ~100 docs/week per individual
Manual processing time reduction 97%
AWS services in pipeline 9 (Cognito, S3, Lambda, Textract, Bedrock, ECS, CloudWatch, Secrets Manager + Snowflake sink)
Frontend Next.js
CLM integrations Coupa, Icertis
Delivery model SaaS

Pipeline diagram

External user
[[Cognito]] (authn / authz)
Next.js frontend
       ▼ document upload
[[Amazon S3]] (durable object storage)
       ▼ S3 event triggers
[AWS Lambda](<../systems/aws-lambda.md>) (orchestrator)
       ▼ invokes
[Amazon Textract](<../systems/amazon-textract.md>) (OCR + metadata extraction)
       ▼ raw text + structure
┌───────────────────────────────────────────┐
│  Doczy.ai proprietary processing layer    │
│  (likely on [Amazon ECS](<../systems/amazon-ecs.md>) containers)    │
│                                           │
│  1. Smart chunking                        │
│     - hierarchy-preserving                │
│     - one-to-many relationships           │
│     - semantic + keyword decomposition    │
│     - sequential IDs + metadata grouping  │
│                                           │
│  2. Dual clustering engine                │
│     ├── Semantic clusters (embeddings)    │
│     │   "similar ideas grouped"           │
│     ├── Structural clusters (patterns)    │
│     │   "clause types, table layouts,     │
│     │    hierarchy"                       │
│     └── Projection algorithm              │
│         "synthesise into unified model"   │
│                                           │
│  3. File-class detection + prompt build   │
│     - few-shot + multi-shot examples      │
│     - domain-tuned per document class     │
└───────────────────────────────────────────┘
       ▼ enriched prompt + dual-clustered context
[Amazon Bedrock](<../systems/amazon-bedrock.md>) (LLM, structured-output mode)
       ▼ structured JSON
[Snowflake](<../systems/snowflake.md>) (centralised structured repository)
       ▼ powers
Intelligent dashboards
       ▼ feeds
Downstream automation
  - Claims systems (reimbursement terms)
  - Vendor invoice verification
  - Contract performance dashboards

Cross-cutting:
  [[CloudWatch]] — real-time monitoring
  [[Secrets Manager]] — sensitive config

Systems extracted

Concepts extracted

Patterns extracted

Caveats

  1. Borderline scope on Tier-1 framing. This is an AWS Architecture Blog partner-solution post — distinct from AArchitecture deep-dives like the BDT Spark-to-Ray retrospective or the cyber-resilience reference architecture. Architecture content is ~30% of body, mixed with PR-flavoured language about "transforming" / "unlocking value" / "strategic asset". Operational numbers are real and concrete; algorithmic internals are at architecture-shape altitude only.
  2. Smart chunking mechanism is a black box. "Patented" + "proprietary" — sequential identifier scheme, dynamic parameter tuning, and the specific algorithm for preserving hierarchy across chunk boundaries are not disclosed. The wiki's concepts/smart-chunking page captures the disclosed properties (hierarchy preservation, one-to-many relationships, semantic + keyword decomposition, metadata-driven grouping) without claiming to know the implementation.
  3. Dual clustering engine internals undisclosed. Embedding model not named (likely a Bedrock-hosted model but not confirmed). Clustering algorithm class (k-means / DBSCAN / hierarchical / density-based) not disclosed for the semantic side; pattern-recognition technique class (rule-based / classifier / structural-parsing) not disclosed for the structural side. The "projection algorithms" that fuse the two sides are named at concept altitude only — projection in what space, against what objective, with what loss / scoring function.
  4. No latency or throughput envelope per document. 250 K contracts/week is the headline weekly capacity number; the article doesn't disclose per-document latency (seconds vs minutes), batch-vs-streaming structure, or peak-vs-sustained throughput.
  5. Bedrock model identity not disclosed. Which foundation model(s) Doczy.ai uses — Claude Sonnet, Claude Opus, Titan, Llama, Mistral, or a multi-model fallback ladder — is not stated. The 442 B tokens / 137 M calls envelope suggests a high-token-budget model class (Claude Opus or similar) but is consistent with several SKUs.
  6. 97% manual-time reduction baseline. What the "manual" baseline is — pre-2020 100-doc/week individuals, 2020–2023 rules-based humans-in-the-loop, or a customer-specific pre-Doczy.ai workflow — is not disambiguated. The 97% figure is presented as cumulative across the AI-era deployment.
  7. $330 M cumulative savings methodology not disclosed. Direct savings (labour displacement, vendor-invoice clawbacks) and indirect savings (faster reimbursement-config rollout, fewer claims errors) are aggregated; per-customer attribution and verification methodology are not provided.
  8. No information about training-data provenance, fine-tuning, or model adaptation. The few-shot / multi-shot prompt-optimisation loop is described, but whether AArete fine-tunes on customer data, adapts retrieval indices per customer, or holds out an eval set is not stated.
  9. Multi-tenancy posture undisclosed. Multi-customer SaaS delivery is named — tenant-isolation mechanism (per-tenant Bedrock keys, per-tenant S3 buckets, per-tenant Snowflake warehouses, IAM-based separation) is not disclosed. Healthcare data (PHI) confidentiality requirements would normally drive strong per-tenant isolation, but the article doesn't address HIPAA / BAA posture.
  10. No comparison vs alternatives. The 99% claim is positioned against a 55% rules-based baseline that was AArete's own pre-2024 system; no comparison vs other contract-intelligence SaaS vendors (Icertis Discovery, Kira / Litera, Evisort, Spellbook, etc.) or vs general-purpose LLMs run with naive prompts on the same dataset. The architectural contribution is the dual-clustering grounding substrate; the 99% number alone doesn't isolate that contribution.
  11. No failure-mode analysis. What happens on out-of-distribution document classes, degraded Textract OCR quality (poor-scan PDFs, handwritten amendments), Bedrock throttling, or Snowflake sink unavailability is not addressed.
  12. No retention / privacy / compliance posture for processed document data. Where the structured outputs persist, for how long, who can query them, encryption-at-rest / encryption- in-transit specifics, and audit-trail availability — none addressed at architectural altitude.

Source

Last updated · 542 distilled / 1,571 read