Skip to content

SYSTEM Cited by 7 sources

Amazon Bedrock

Amazon Bedrock is AWS's managed foundation-model runtime — a single API surface for inference against hosted first-party models (Amazon Titan / Nova) and third-party models (Anthropic Claude, Meta Llama, AI21, Cohere, Mistral, Stability) with common auth, observability, and safety plumbing.

Stub page — expand as specific Bedrock-internals sources are ingested. Linked from higher-level Bedrock-family surfaces already on the wiki:

Capacity primitives: Provisioned Throughput + On-Demand

Bedrock exposes two capacity primitives, both denominated in Model Units (MUs) for the LLM serving SKUs:

  • Provisioned Throughput (PT) — customer reserves dedicated capacity for a 1–6 month contract term. Predictable performance; fixed cost regardless of utilisation. Best fit for high-volume, latency-sensitive workloads.
  • On-Demand (OD) — pay-per-token / pay-per-request against a shared regional pool. No commitment; usage-proportional cost. Best fit for bursty / async / scheduled workloads.

The hybrid posture PT-with-OD-spillover composes both: PT for the floor, OD as overflow. See concepts/provisioned-throughput-vs-on-demand-llm for the full trade-off and concepts/llm-over-provisioning-cycle + concepts/llm-provider-commitment-lock-in for the structural PT failure modes that drive the move to hybrid.

Bedrock as primary launchpad for Anthropic on AWS

The 2026-05-28 Slack AI multi-cloud retrospective canonicalises Bedrock's role as AWS's primary launchpad surface for new LLMs — particularly Anthropic models. Slack's Phase 1 → Phase 2 SageMaker → Bedrock migration was driven primarily by model feature lag: model iterations and optimisations debuted on Bedrock "weeks or months before SageMaker availability", even when SageMaker hosted the same model family via escrow VPC. The structural cause: AWS prioritised Bedrock as "its purpose-built managed LLM service".

Wiki canonical face: enterprise multi-cloud LLM endpoint

The 2026-05-28 source canonicalises Bedrock as the AWS endpoint in Slack's multi-cloud LLM serving stack alongside GCP Vertex AI. Slack's Intelligent Routing Layer routes between Bedrock and Vertex AI based on metric-driven model selection per feature, with Bedrock carrying both PT (latency-sensitive) and OD (bursty) load. Production substrate scale: "millions of users" across the Slack AI feature suite.

Seen in

  • sources/2026-03-18-aws-ai-powered-event-response-for-amazon-eks — named as the substrate for AWS DevOps Agent's reasoning: "Built on Amazon Bedrock, the agent can analyze complex operational scenarios and correlate data from multiple sources." Which specific model the DevOps Agent runs on is not disclosed.
  • sources/2026-04-01-aws-automate-safety-monitoring-with-computer-vision-and-generative-aiClaude multi-modal LLMs on Bedrock analyse misclassified samples + detect underrepresented object classes in the training distribution, directly informing data-collection + synthetic-data priorities (patterns/data-driven-annotation-curation). Amazon Nova on Bedrock generates tape-labelling UI illustrations for the product documentation.
  • sources/2025-12-11-aws-architecting-conversational-observability-for-cloud-applications — Bedrock hosts both the embedding model (amazon.titan-embed-text-v2:0) in the RAG ingest pipeline and the LLM for reasoning + kubectl-command generation + final resolution in the troubleshooting chatbot. The specific reasoning model is not disclosed; both RAG and Strands deployment options route through Bedrock.
  • sources/2025-09-24-zalando-dead-ends-or-data-goldmines-ai-powered-postmortem-analysisBedrock as the compliance-cleared frontier-LLM substrate for regulated text. Zalando's datastore SRE team runs the postmortem analysis pipeline's current generation on Claude Sonnet 4 on Bedrock after transitioning from on-prem LM Studio. Canonical wiki datum that the driver of the on-prem → cloud LLM transition was legal / compliance clearance, not capability: postmortems "contain PII data of on-call responders, companies business metrics, GMV losses, etc. The legal alignment was a pre-condition before using cloud hosted LLMs." Per-postmortem processing on Bedrock: ~30 s. Residual surface-attribution error rate ~10%, consistent with frontier-tier-across- hosting-substrates observation.
  • sources/2026-04-23-aws-modernizing-kyc-with-aws-serverless-solutions-and-agentic-ai — Bedrock hosts the foundation models inside each of the five KYC sub-agents: OCR + multi-language extraction (Document Analysis), natural-language processing + name-variation handling (Identity Verification), behavioural analysis + semantic-similarity fraud search (Fraud Detection), regulatory-framework interpretation (Compliance & Risk), journey-optimisation + friction-point detection (Customer Experience). Bedrock also powers the embedding model feeding the Knowledge Base over regulations + compliance rules + vendor docs. Specific model identities not disclosed.
  • sources/2026-05-28-slack-slack-ai-the-path-to-multi-cloudBedrock as enterprise multi-cloud LLM endpoint. Slack's three-year evolution moved their Slack AI feature suite to Bedrock in mid-2024 (Phase 2, PT-only, zero-incident migration) → added On-Demand for bursty workloads (Phase 3, Hybrid PT+OD with spillover) → added GCP Vertex AI alongside in early 2026 (Phase 4, multi-cloud). Verbatim disclosures: Bedrock achieved "FedRamp Moderate compliance" gating the Phase 2 migration; Model Units (MUs) disclosed as the capacity primitive ("Each MU provides a deterministic amount of throughput, measured in tokens per minute. Shifting from GPU instances to MUs allowed us to abstract away the hardware and focus entirely on raw throughput"); 1–6 month PT commitment terms surfaced as the commitment lock-in failure mode; OD shared-pool concentration risk surfaced as the Phase 3 → Phase 4 driver. First wiki disclosure of PT+OD-with-spillover as a Bedrock-internal pattern, and of Slack's customer-side adoption of the MU primitive (sibling to Databricks' platform-side coining of the same primitive in sources/2026-05-27-databricks-reliable-llm-inference-at-scale). Phase 4 canonicalised outcomes (vs single-cloud Bedrock): ~10% quality lift on complex reasoning, ~67% latency reduction on high-velocity workloads.
  • sources/2026-06-02-aws-automating-contract-intelligence-with-doczyai-on-awsBedrock as the LLM-grounded extraction tier for a production document-intelligence pipeline. AArete's Doczy.ai runs ~250 000 contracts/week through Bedrock, with 137 million Bedrock API calls and 442 billion tokens processed over 22 months — one of the largest single-pipeline Bedrock token-volume disclosures on the wiki. Avg ~3 200 tokens/call, consistent with multi-page contract chunks plus structured-output prompt overhead. Bedrock sits at the LLM-extraction stage of the managed AI document-intelligence pipeline — receives prompts grounded in smart-chunked + dual-clustered document representations, plus per-class-routed few-shot/multi-shot examples, and produces structured-output JSON consumed downstream by Snowflake and CLM-system integrations. Reported 99% extraction accuracy on contracts (vs ~55% rules-based baseline) — Bedrock's contribution within the broader pipeline isn't separately attributed, but the pipeline's headline accuracy depends on Bedrock's extraction step. Specific Bedrock model not disclosed (the 3 200-tokens-per-call envelope is consistent with Claude Opus / Sonnet / similar SKUs).
Last updated · 542 distilled / 1,571 read