Skip to content

AWS (Amazon Web Services)

The AWS blog family — the AWS News Blog, AWS Open Source Blog, AWS Architecture Blog, AWS Compute Blog, AWS Storage Blog, and others at aws.amazon.com/blogs/* — collectively form one of the canonical Tier-1 system-design sources. AWS blog posts vary widely in signal: at one end, substantive architecture retrospectives with quantified production numbers (Amazon Retail BDT's Spark-to-Ray migration is the canonical recent example); at the other, product PR / feature announcements filtered out per the AGENTS.md scope rules.

For the complementary (and often higher-signal) source, see companies/allthingsdistributed — Werner Vogels' blog republishes primary-source AWS / Amazon architecture content from CTO perspective.

Scope and what we ingest from AWS blogs

Ingest eagerly (Tier 1 treatment):

  • Production architecture retrospectives with concrete scaling numbers (e.g. BDT's exabyte-scale Ray migration).
  • Team postmortems or incident writeups with named systems.
  • AWS service-design posts that explain trade-offs, not just features (often cross-posted with companies/allthingsdistributed).
  • Open-source contribution narratives that expose internal design (DeltaCAT, Firecracker, Aurora DSQL, etc.).

Skip:

  • Service-GA announcements / feature launches without architectural depth (PR/FAQ posts belong on companies/allthingsdistributed if they have architectural content).
  • Industry / vertical marketing posts ("AI for X industry").
  • Pricing announcements, account-opening posts, region-launch announcements.
  • Customer-case-study puff pieces.
  • Conference-session recaps without architectural specifics.

Key systems (as surfaced in ingested sources)

Data platform / Amazon Retail BDT:

ML / computer-vision — SageMaker AI subsystems + adjacent:

Web application / analytics / BI:

Relational DB (beyond the Aurora sovereignty/consistency lineage below):

  • systems/amazon-aurora — cloud-native Postgres / MySQL- compatible relational engine; the parent line of Aurora DSQL + Aurora Limitless. Common application-state backbone for AWS customer architectures; downstream of ML-inference gatekeeping in CV-safety pipelines.

Compute / storage / integration primitives:

Other AWS / Amazon systems referenced across sources:

  • Most AWS service lineage lives on companies/allthingsdistributed — S3, EBS, Nitro, Lambda, Firecracker, Aurora DSQL, SageMaker, Bedrock Guardrails, Kiro. Cross-reference there.

Relational databases / Postgres family:

  • systems/aws-rds — managed relational (MySQL / Postgres / MariaDB / SQL Server / Oracle); Multi-AZ cluster Postgres inherits community Postgres's Long Fork visibility anomaly (Jepsen 2025-04-29; AWS response 2025-05-03).
  • systems/postgresql — the upstream substrate; visibility model (ProcArray scan, asynchronous with WAL commit) is the root cause. AWS's PostgreSQL Contributors Team (formed 2022) is co-developing the proposed CSN upstream fix.
  • systems/aurora-dsql — ground-up distributed SQL; replaces ProcArray visibility with time-based MVCC, sidestepping Long Fork. Wire-compatible Postgres via public extension API.
  • systems/aurora-limitless — horizontally-scaled Aurora Postgres; also replaces ProcArray with time-based MVCC.

Service mesh / container networking:

  • systems/aws-app-mesh — AWS's first-gen Envoy-based sidecar service mesh for ECS/EKS/Fargate. Discontinued 2026-09-30, closed to new customers 2024-09-24. Four-tier abstraction (Mesh / Virtual Service / Virtual Router / Virtual Node) + customer- managed Envoy sidecar per Task.
  • systems/aws-ecs-service-connect — current managed replacement for ECS. Flat Client/Server role model, AWS-managed Service Connect Proxy (Envoy under the hood), free CloudWatch app-level metrics. Not yet mTLS-capable (2025-01-18).
  • systems/aws-vpc-lattice — current replacement for EKS. Not a sidecar mesh — VPC-level service-networking managed control + data plane across EKS / EC2 / Lambda / on-prem.
  • systems/amazon-ecs — compute substrate under both meshes; Service ↔ Task ↔ Task Definition abstraction; exclusive mesh-membership constraint is load-bearing for migration.
  • systems/aws-cloud-map — shared service-discovery substrate. Cross-account namespace sharing is not supported, forcing single-account deployments in Service Connect.
  • systems/aws-private-ca — TLS certificate authority under both meshes; App Mesh uses general-purpose certs, Service Connect uses short-lived certs (cheaper).
  • systems/amazon-route53 — DNS weighted-routing primitive for blue/green mesh migration edge traffic shifting.

Partitions / cross-partition sovereign-failover architecture:

PKI:

Disaster recovery / resilience (within a partition, cross-Region + cross-account):

  • systems/aws-backup — unified data-protection control plane tying together per-service backup mechanisms (RDS / EBS / S3 / Aurora etc.) behind vaults + policies + schedules; added first- party backup coverage for services that lacked it (EFS, FSx) and cross-Region backup for DynamoDB; canonical backup-and-restore tier primitive on the DR ladder.
  • systems/aws-elastic-disaster-recovery (AWS DRS) — continuous block-level replication, recovery orchestration, automated server conversion; seconds RPO, 5–20 min RTO typical; target VPC configuration on recovery; the canonical pilot-light / warm-standby enabling primitive.
  • systems/arpio — AWS Resilience Competency Partner SaaS; full- workload discovery + backup + cross-Region cross-account recovery on top of AWS Backup + AWS DRS + service-native primitives;

    140 AWS resource types covered; the named DR config-translation layer via Route 53 private-hosted-zone CNAMEs.

Event-driven architecture / org-scale pub/sub:

  • systems/amazon-eventbridge — managed serverless event bus; content-based routing rules + schema registry / discovery + cross-account targets via resource policies; the canonical AWS substrate for event-driven architecture at organisation scale. Load-bearing gap vs a strict- validation requirement: no native schema validation.
  • systems/amazon-key — physical-access-management product family (In-Garage Delivery, apartment-building access); production instance of patterns/single-bus-multi-account on EventBridge plus a custom schema repository + client library + CDK subscriber constructs library. Reported 2,000 events/s / 99.99% success / 80ms p90 / 14M subscriber calls post-migration; integration time for new use cases 5d → 1d.
  • systems/aws-cdk — IaC substrate for the reusable subscriber constructs pattern — per-subscriber event bus + cross-account IAM + monitoring + alerting packaged behind a ~5-line new Subscription(...) construct.

Multi-account SaaS platform (account-per-tenant):

  • systems/aws-stacksets — AWS's fan-out deployment primitive: one CloudFormation template, many target accounts/Regions from a central admin account. Load-bearing for account-per-tenant CI/CD at ProGlove's ~6,000-account scale. Named failure modes: partial rollouts, pipeline duration, tooling maturity edge cases.
  • systems/aws-codepipeline — central orchestration point for fan-out deployment; single execution triggers a single StackSet update that fans out in parallel.
  • systems/aws-cloudformation — the underlying declarative IaC engine under both StackSets and CDK.
  • systems/aws-step-functions — account-creation orchestrator in ProGlove's lifecycle; account-retirement deliberately kept as scripts (architectural asymmetry is the signal).
  • systems/aws-cost-explorer — transparent per-tenant cost attribution by virtue of the account boundary being the billing boundary; key benefit of account-per-tenant for consumption- priced SaaS.
  • systems/aws-observability-access-manager — AWS-native cross-account CloudWatch observability primitive; ProGlove built its own third-party aggregation before OAM shipped, now the recommended starting point for new platforms.
  • systems/proglove-insight — ProGlove's SaaS platform; the canonical wiki production reference for concepts/account-per-tenant-isolation on AWS (~6,000 tenant accounts, 3-person platform team, ~1M Lambda functions).

Internal developer platform / platform engineering on EKS:

  • systems/santander-catalyst — Santander's in-house IDP on AWS EKS — canonical wiki production reference for platform engineering at large-enterprise regulated-industry scale (160M+ customers, 200+ critical systems, billions of daily transactions). Co-built with AWS ProServe via the Platform Strategy Program (PSP). Provisioning cycle 90 days → hours / minutes; PoC prep 90 days → 1 hour; 100+ pipelines consolidated; GenAI agent stack 105 days → 24 hours; ~3,000 monthly data-experimentation tickets eliminated.
  • systems/crossplane — CNCF universal resource provisioner; every cloud / SaaS resource modeled as a K8s CR reconciled by a controller; XRDs + Compositions as the composability primitive. Catalyst's stacks catalog.
  • systems/argocd — CNCF GitOps continuous-delivery controller for Kubernetes; Git as the source of truth; continuous-reconcile loop. Catalyst's data-plane claims component.
  • systems/open-policy-agent — CNCF policy engine (Rego) + Gatekeeper K8s admission controller; enforces compliance + security at admission time. Catalyst's policies catalog; the regulated-bank analogue of SCPs in ProGlove.
  • systems/aws-eks — also serves as the infrastructure control plane cluster hosting Crossplane + ArgoCD + OPA; a fundamentally different role from app-compute EKS (Figma, Convera).
  • systems/databricks — named integration target in Catalyst's modern data platform workload (built-in integration).

AI-for-ops / AI-powered incident response:

  • systems/aws-devops-agent — AWS's fully managed autonomous AI agent for EKS incident investigation and preventive recommendations. Built on Amazon Bedrock; accessed through a purpose-built web UI behind an Agent Space (tenant configuration unit — IAM + IdP + data-source endpoints + scope). AWS vendor peer to Datadog's Bits AI SRE on the same category axis (hosted agent for live-telemetry incident investigation), with a different vendor relationship (AWS managed service scoped to AWS cloud resources). Canonical wiki reference for telemetry-based Kubernetes resource discovery — agent combines a Kubernetes API scan (graph nodes) with OpenTelemetry-derived runtime relationships (graph edges) into a fused dependency graph used for root-cause analysis.
  • systems/strands-agents-sdk — AWS's open-source Python SDK for agentic systems (multi-agent orchestration, MCP tool calling, session management); used in the self-build alternative to the DevOps Agent — the Strands variant of the 2025-12-11 conversational-observability blueprint — hosting three specialized agents (Orchestrator / Memory / K8s Specialist).
  • systems/eks-mcp-server — AWS-Labs-published MCP server exposing Kubernetes / EKS operations as standardized MCP tools; the agent-native interface to a cluster in the Strands variant of the 2025-12-11 blueprint.
  • systems/fluent-bit — CNCF telemetry forwarder running as a cluster DaemonSet; ingest tier of the telemetry-to-RAG pipeline in the RAG variant of the 2025-12-11 blueprint (Fluent Bit → Kinesis → Lambda + Bedrock embeddings → OpenSearch Serverless).
  • systems/amazon-kinesis-data-streams — AWS's managed durable streaming substrate; ingest-buffer tier of the same telemetry-to-RAG pipeline. Enables Lambda batching as the primary cost lever at the embedding-generation layer.
  • systems/amazon-bedrock — managed foundation-model runtime underlying the DevOps Agent.
  • systems/amazon-managed-prometheus — metrics data source (one of four canonical Agent-Space data sources).
  • systems/aws-x-ray — traces data source (one of four canonical Agent-Space data sources).

Containers — EKS + Auto Mode + peer AWS services:

  • systems/eks-auto-mode — managed-data-plane variant of EKS; AWS operates Bottlerocket nodes, default add-ons, cluster upgrades; customer retains node-pool policy + disruption-budget- guarded upgrade contract. Canonical Kubernetes-layer instance of concepts/managed-data-plane.
  • systems/bottlerocket — container-optimised Linux distro; default AMI under EKS Auto Mode; immutable root + A/B transactional updates.
  • systems/amazon-guardduty — managed threat-detection with EKS protection + runtime monitoring + CloudTrail + malware detection → MITRE ATT&CK-annotated multistage attack findings.
  • systems/amazon-inspector — managed vulnerability scanner; ECR-image-to-running-container mapping enables runtime vulnerability prioritisation by actual production exposure.
  • systems/aws-network-firewall — managed stateful firewall; SNI-based egress allow-listing at per-VPC scale is the canonical concepts/egress-sni-filtering pattern; 2025-11-26 EVS post surfaces the centralised-inspection shape (native TGW attachment, Appliance Mode auto-enabled, Domain-list FQDN rule groups) for hub-and-spoke deployment across many VPCs + on-prem via DXGW.
  • systems/amazon-evs — managed VMware Cloud Foundation (VCF) stack running on EC2 bare-metal inside a customer VPC; target for lift-and-shift VMware migrations; NSX overlay + vSAN + vMotion all integrated with AWS-native networking.
  • systems/aws-vpc-route-server — BGP-speaking VPC primitive; bridges overlay networks (NSX inside EVS) to AWS-native VPC route tables so TGW / Network Firewall can route to overlay CIDRs.
  • systems/external-secrets-operator — CNCF K8s operator that syncs from Secrets Manager to native K8s Secret objects (env-var consumption path; no volume mounts or daemonsets).
  • systems/amazon-managed-grafana — managed Grafana; Generali uses with CloudWatch data source for per-namespace tenant dashboards.
  • systems/generali-malaysia-eks — Generali Malaysia's EKS platform as a synthesized case study (Malaysian insurance customer): six peer-AWS-service integration surface + stateless- only + immutable pods + Helm + HPA discipline.
  • systems/karpenter — CNCF open-source Kubernetes node autoscaler, AWS-originated; canonical wiki production reference is Salesforce's 1,000-cluster / 1,180-node-pool migration (2026-01-12). Solves multi-minute scaling latency, subnet-pinned provisioning, poor AZ balance, and rigid node-group boundaries of the predecessor CA / ASG stack.
  • systems/cluster-autoscaler — CNCF predecessor autoscaler that Karpenter is displacing on AWS; indirection through ASGs produces minutes-scale latency, thousands of rigid node groups, poor AZ balance.
  • systems/aws-auto-scaling-groups — AWS EC2 capacity primitive underneath Cluster Autoscaler; Karpenter bypasses.
  • systems/salesforce — customer with the largest known EKS fleet (1,000+ clusters / 1,180+ node pools); canonical wiki Karpenter-at-extreme-scale production reference.

Key patterns / concepts introduced via AWS blog sources

Computer vision + GenAI at scale:

  • patterns/serverless-driver-worker — canonical instance in the AWS safety-monitoring solution; driver orchestrates, per-use-case workers scale + fail independently; each worker chain is SNS → SQS → SageMaker endpoint with its own DLQ. Inference acts as gatekeeper filtering image volume so Aurora isn't overwhelmed.
  • patterns/multilayered-alarm-validation — four-stage composition (object detection → zone overlap → loiter-time persistence → confidence + RLE-mask validation) that turns per-frame detections into auditable alarms.
  • patterns/alarm-aggregation-per-entity — per-(entity, use-case) rollup; append new occurrences to open records; scheduled auto-close on resolution; SLA escalation through per-zone preferred channels.
  • patterns/data-driven-annotation-curation — Athena-driven FP- rate aggregation + below-threshold-confidence sampling + Claude multi-modal analysis of misclassified samples for class imbalance; replaces blanket per-site daily annotation.
  • patterns/synthetic-data-generation — GLIGEN + SageMaker Batch Transform producing auto-annotated training data at 75K-image scale per use case; YOLOv8 hits 99.5% mAP@50 for PPE without any manually-annotated real images.
  • patterns/multi-account-isolation — workload-purpose-axis separation (training / ingest / web-app / analytics each in distinct AWS accounts); distinct from [[concepts/account-per- tenant-isolation]] which is tenant-axis. PII containment + blast-radius + compliance alignment.
  • concepts/alert-fatigue — named failure mode the alarm- aggregation + multilayered-validation stack is designed around.

  • concepts/copy-on-write-merge — the compaction strategy that Amazon BDT ran at exabyte scale in-house before the open table formats canonicalised the name.

  • concepts/change-data-capture — the upstream workload shape driving all of this.
  • concepts/task-and-actor-model — Ray's programming model, the specialist-enabling lower layer vs Spark's dataflow abstraction.
  • concepts/locality-aware-scheduling, concepts/zero-copy-sharing, concepts/memory-aware-scheduling — the Ray-mechanism concepts that make specialist hand-crafted distributed algorithms beat generalists on specialist workloads.
  • concepts/managed-data-plane — the operational-ownership-on- the-data-plane primitive that distinguishes Service Connect / VPC Lattice from App Mesh; canonical AWS instance of the control-plane-vs-data-plane orthogonal axis.
  • concepts/mutual-tls — notable feature gap in Service Connect vs App Mesh at EOL-transition time; blocks regulated workloads from simple lift-and-shift.
  • patterns/managed-sidecar — AWS-managed Service Connect Proxy vs customer-managed App Mesh Envoy sidecar; narrowed configurability (timeouts only) for full vendor-operated lifecycle.
  • patterns/blue-green-service-mesh-migration — forced pattern for App Mesh → Service Connect because an ECS Service can't be in both meshes; edge traffic shifting via Route 53 / CloudFront continuous deployment / ALB multi-target-group.
  • patterns/shadow-migration — the canonical dual-run reconciliation pattern, instantiated across Amazon BDT's multi-year Spark → Ray migration.
  • patterns/subscriber-switchover — the per-consumer cutover pattern that earns rollback granularity after shadow migration.
  • patterns/heterogeneous-cluster-provisioning — Amazon BDT's EC2 capacity pattern: discover an instance-type set, provision whichever is most available, keep workloads arch/hardware-agnostic.
  • patterns/reference-based-copy-optimization — the "don't rewrite files the compaction didn't touch" optimisation that is a named contributor to Amazon BDT's 82% cost-efficiency gain.

Multi-tenant configuration services (tagged-storage pattern):

  • patterns/tagged-storage-routing — Strategy-Pattern factory dispatches storage requests to the best-fit backend based on the request key's prefix; adding a new backend is one new class
  • one map entry; canonical AWS pair is DynamoDB (high-frequency per-tenant) + Parameter Store (shared hierarchical).
  • patterns/event-driven-config-refreshEventBridge + Lambda + Cloud Map + gRPC pipeline pushes config updates into live service instances' in-memory caches within seconds without polling or restart; escape valve from the TTL-vs-staleness dilemma for shared-config workloads.
  • patterns/jwt-tenant-claim-extraction — tenant context sourced exclusively from the validated Cognito JWT's immutable custom:tenantId claim; tenantId in request bodies / paths / headers is never read; cross-tenant access via body manipulation structurally impossible.
  • concepts/cache-ttl-staleness-dilemma — the forcing function the tagged-storage + event-driven-refresh composite resolves; TTL-based caches for rapidly-changing tenant metadata force an unacceptable stale-vs-amplified-load trade-off at multi-tenant scale.

Postgres consistency-model work:

  • concepts/snapshot-isolation — the formal model Postgres's clustered implementation does not guarantee (surfaced by Jepsen 2025-04-29, acknowledged by AWS 2025-05-03).
  • concepts/long-fork-anomaly — the specific SI violation Postgres exhibits; concurrent non-conflicting transactions observed in different orders by primary + replica.
  • concepts/visibility-order-vs-commit-order — the mechanism: Postgres's commit path writes the WAL record, then asynchronously removes the xid from ProcArray.
  • concepts/commit-sequence-number — the proposed upstream fix; multi-patch effort, PGConf.EU 2024 talk, AWS PostgreSQL Contributors Team participating.

AI trust / automated-reasoning productization:

  • systems/bedrock-guardrails-automated-reasoning-checks — Bedrock safeguard that formally verifies LLM outputs against a customer-authored policy; preview-launched 2024-12-04 in US West (Oregon).
  • concepts/autoformalization — natural-language → formal-spec translation pipeline; first public disclosure in the 2024-12-04 preview-launch post (document → concepts → units → logic → logical model); variable descriptions as the load-bearing accuracy-tuning surface.
  • patterns/post-inference-verification — the canonical pattern Bedrock Guardrails AR checks productizes; three-verdict output (Valid / Invalid / Mixed) with structured suggestions; regenerate-with-feedback loop feeds the reasoner's natural-language rule descriptions back to the LLM as corrective prompts.

Digital sovereignty / cross-partition failover architecture:

  • concepts/aws-partition — logically isolated group of AWS Regions with its own IAM; hard boundary for credentials, cross-region primitives, and service availability. The central primitive in sovereign-failover design.
  • concepts/digital-sovereignty — demand-side framing: "managing digital dependencies — deciding how data, technologies, and infrastructure are used, and reducing the risk of loss of access, control, or connectivity." The human-driven-disaster class that pushes you across the partition boundary.
  • concepts/disaster-recovery-tiers — backup / pilot light / warm standby / active-active canonical AWS DR ladder; same ladder applied across the partition axis, with pilot-light the cross-partition default.
  • concepts/cross-partition-authentication — because IAM credentials don't cross, auth is explicit: IAM roles with trust + external IDs, STS regional endpoints, resource-based policies, cross-account roles via Organizations, federation from a centralized IdP (best practice).
  • concepts/cross-signed-certificate-trust — "double-signed certificates" — per-partition root CAs cross-sign each other to enable authenticated cross-partition mTLS without violating partition isolation.
  • patterns/cross-partition-failover — the overarching pattern: duplicate infrastructure across partitions + one of the DR tiers + per-partition IAM / PKI / Organizations / networking.
  • patterns/pilot-light-deployment, patterns/warm-standby-deployment — two specific DR tiers endorsed for cross-partition.
  • patterns/centralized-identity-federation — federate from a single IdP to all partitions; modern best practice for cross-partition auth; avoids per-partition IAM users.

Disaster recovery / resilience (within-partition):

  • concepts/rpo-rto — the two DR budget dimensions; AWS DRS quantified at seconds RPO / 5–20 min RTO, AWS Backup at hours RPO / RTO; tier choice derived from the business-set RPO/RTO targets.
  • concepts/crash-consistent-replication — block-level replica equivalent to a crash+reboot of the source; strictly weaker than app-consistent but achievable continuously — the consistency model AWS DRS uses for its seconds-RPO guarantee.
  • concepts/cross-region-backup — fault-isolation axis (natural/technical disasters); the baseline multi-Region backup-copy primitive unified under AWS Backup.
  • concepts/cross-account-backup — compromise-isolation axis (ransomware / malware / malicious insider); AWS Backup cross-account copy is the unified primitive; clean-room recovery account is the canonical target.
  • concepts/clean-room-recovery-account — separate AWS account with distinct credentials as a ransomware/malware isolation boundary; sibling use of the AWS account boundary alongside concepts/account-per-tenant-isolation.
  • concepts/dr-config-translation — restored resources have new identifiers (endpoints, ARNs); canonical mechanism is Route 53 private-hosted-zone CNAME indirection so applications keep resolving the old name to the new endpoint without config rewrites.
  • patterns/block-level-continuous-replication — the continuous seconds-scale replication pattern AWS DRS implements; enables pilot-light + warm-standby tiers at seconds RPO.
  • patterns/backup-and-restore-tier — the lowest DR tier on the ladder; AWS Backup + EventBridge + Lambda automation; hours-scale RPO/RTO, near-zero steady-state cost.

Event-driven architecture / schema governance:

  • concepts/event-driven-architecture — architectural style where services communicate via asynchronous events on a shared bus; supersedes ad-hoc SNS / SQS pairs at org scale. The canonical AWS substrate is EventBridge.
  • concepts/service-coupling — framing for the failure mode EDA addresses: tight-coupling cascade deadlocks. Amazon Key pre-migration exhibited exactly this — Service-A issues → timeouts + retries amplifying load → cross-service deadlock; single-device-vendor issues causing fleet-wide degradation.
  • concepts/schema-registry — versioned contract store for event definitions; single source of truth for publishers and subscribers. EventBridge has a schema registry but no native validation; strict-validation customers build on top.
  • patterns/single-bus-multi-account — one shared event bus in a central account + per-service-team accounts; DevOps owns bus + rules + targets, service teams own application stacks; logical separation via rules, not buses. AWS reference pattern.
  • patterns/client-side-schema-validation — validate events in a shared client library rather than a centralized validation service; immediate developer feedback + no runtime network hop; addresses EventBridge's missing native-validation gap.
  • patterns/reusable-subscriber-constructs — package subscriber infra as a versioned IaC construct library (CDK) — dedicated event bus + cross-account IAM + monitoring + alerting from ~5 lines. Amazon Key reports publisher/subscriber integration time 40h → 8h.

Fine-grained application authorization:

  • systems/amazon-verified-permissions — managed Cedar policy engine for application authorization; the application-authz counterpart to IAM. IsAuthorized synchronous evaluation at "millisecond-level"; submillisecond end-to-end when fronted by API Gateway's authorizer-decision cache. Per-tenant policy stores are the idiomatic SaaS isolation primitive.
  • systems/cedar — the policy language, public extraction of AWS's decade of internal policy-semantics work (see systems/aws-policy-interpreter). Analyzable by design. Combines RBAC + ABAC + ReBAC in one language.
  • systems/amazon-cognito — identity substrate paired with AVP across Convera's four authorization flows; user pool for customers, machine-to-machine user pool for service-to-service, per-tenant pool for multi-tenant. Pre-token-generation Lambda hook enriches JWTs at issue time.
  • systems/amazon-api-gateway — ingress tier hosting the patterns/lambda-authorizer; built-in authorizer-decision cache delivers submillisecond repeat-request latency.
  • systems/okta — external enterprise IdP; federated-to by Cognito in Convera's internal-user flow (patterns/centralized-identity-federation).

Fine-grained application authorization — concepts / patterns:

  • concepts/fine-grained-authorization — per-resource, per-action, context-aware authorization (vs coarse-grained role-to-endpoint); the evaluation model Cedar + AVP deliver.
  • concepts/attribute-based-access-control — ABAC as the idiomatic fine-grained authz realization; Cedar combines ABAC with RBAC and ReBAC in one language.
  • concepts/policy-as-data — Cedar policies in a DynamoDB source of truth + DynamoDB Streams continuously sync into AVP policy stores; authorship gated by a regulated IAM role owned by infosec.
  • concepts/tenant-isolation — five-layer enforcement chain for Convera's multi-tenant SaaS (identity → token → authorization → routing → data); a bug in any one layer can't leak across tenants.
  • concepts/zero-trust-authorization — every tier that handles a privileged request independently re-verifies; production instance in Convera's backend pods that re-call AVP before hitting RDS.
  • concepts/authorization-decision-caching — two-level cache (API Gateway authorizer-decision + app-level Cognito token) delivers submillisecond repeat-request latency.
  • concepts/token-enrichment — push per-user attribute lookup off the hot path by injecting attributes into the JWT at issue time (via the pre-token hook).
  • patterns/lambda-authorizer — Lambda in front of API Gateway evaluating Cedar via AVP; the hot-path authz compute across all four Convera flows.
  • patterns/per-tenant-policy-store — AVP idiom for multi-tenant SaaS: one policy store per tenant, tenant_id → policy-store-id lookup from DynamoDB. Chosen for isolation, per-tenant schema/template customization, easy onboarding/offboarding, and per-tenant resource quotas.
  • patterns/pre-token-generation-hook — Cognito Lambda trigger that enriches the JWT with authorization-relevant attributes from RDS / DynamoDB at login time.
  • patterns/zero-trust-re-verification — backend re-runs AVP against the tenant's policy store before data access; data layer (RDS) is further configured to accept only tenant-scoped requests.
  • patterns/machine-to-machine-authz — same Lambda-authorizer shape reused for service-to-service via Cognito's OAuth client-credentials flow; per-service policy stores.

Internal developer platform / platform engineering at enterprise scale:

  • patterns/platform-engineering-investment — second canonical production instance on the AWS blog (after ProGlove) via Santander Catalyst; large-enterprise regulated-industry counterpart to ProGlove's small-team SaaS instance. Kubernetes- native substrate on EKS instead of AWS-Organizations-native.
  • patterns/developer-portal-as-interface — Santander's in-house developer portal as the unified self-service surface hiding EKS / Crossplane / ArgoCD / OPA behind one interface; "Platform APIs become the internal product" in concrete form.
  • patterns/crossplane-composition — XRDs + Compositions as the unit of reuse for the stacks catalog; Kubernetes-native realization of patterns/golden-path-with-escapes at multi-cloud-infrastructure level.
  • patterns/policy-gate-on-provisioning — OPA Gatekeeper as a K8s admission controller enforcing compliance + security on every Crossplane claim at manifest-submission time; shift-left compliance; the regulated-industry counterpart to SCPs in ProGlove's AWS-Organizations-based shape.
  • concepts/universal-resource-provisioning — Crossplane's abstraction: every cloud / SaaS resource as a K8s CR reconciled by a controller; uniform API + RBAC + GitOps across clouds.
  • concepts/gitops — Git as declarative source of truth, continuous-reconcile controllers; ArgoCD the canonical K8s-native realization; Catalyst's application-delivery contract.
  • concepts/control-plane-data-plane-separation — Catalyst's first wiki instance of the split at infrastructure- provisioning tier: EKS cluster decides, provisioned AWS (and multi-cloud) resources are the data plane.

AI-for-ops / AI-powered incident response:

  • concepts/telemetry-based-resource-discovery — AWS DevOps Agent's core methodology: combine a Kubernetes API scan (the graph nodes: Pods / Deployments / Services / ConfigMaps / Ingress / NetworkPolicies with their metadata) with OpenTelemetry-derived runtime relationships (the weighted edges: service-mesh traffic, distributed traces, metric attribution) into a fused dependency graph used for investigation. Neither path alone is sufficient — the static API gives you the graph, telemetry tells you which edges are alive and misbehaving.
  • concepts/agentic-troubleshooting-loop — iterative LLM ↔ tool-assistant investigation cycle; LLM proposes diagnostic queries, tool assistant executes them against live system state, output re-enters LLM context, repeats until the LLM judges enough context for resolution. Canonical wiki reference is the 2025-12-11 AWS conversational-observability blueprint; the 2026-03-18 AWS DevOps Agent post is the managed-service realization of the same primitive with a structured discovery step added on top.
  • patterns/telemetry-to-rag-pipeline — streaming operational telemetry into a vector store for LLM augmentation; canonical AWS shape is Fluent BitKinesis Data StreamsLambda (batched) + Bedrock Titan Embeddings v2OpenSearch Serverless (hot) or S3 Vectors (cold). Sanitize-before- embedding is named as the vector-store governance boundary.
  • patterns/allowlisted-read-only-agent-actions — constrain an LLM-driven agent's side effects to a static allowlist of read-only verbs (kubectl get / describe / logs / events) + platform-layer RBAC. Canonical AWS realization is the in- cluster troubleshooting assistant pod in the 2025-12-11 blueprint; defense-in-depth via two-layer enforcement (app allowlist + K8s RBAC).

Agentic AI development (developer-side feedback loops):

  • concepts/agentic-development — development model where the AI agent "writes, tests, deploys, and refines code through rapid feedback cycles", not just suggests snippets. Inner-loop driver, not outer-loop. The 2026-03-26 AWS post's central reframing: agentic coding is gated on architecture, not prompt quality.
  • concepts/fast-feedback-loops — the primary architectural constraint of agentic development; each unvalidated change should use the cheapest tier that can falsify it. Five tiers named: local emulation → offline data/ML dev → hybrid cloud → preview env → production deploy.
  • concepts/local-emulation — umbrella concept over SAM sam local start-api, same-image container run, DynamoDB Local, Glue Docker images. Cheapest feedback tier; API-shape parity with real services.
  • concepts/contract-first-design — OpenAPI specifications authored upfront so agents validate integrations before sibling services are implemented; pairs with preview environments.
  • concepts/hexagonal-architecture — codebase layer discipline (/domain no Amazon deps, /application orchestration, /infrastructure adapters). The precondition that makes domain- layer unit tests run without cloud credentials.
  • concepts/project-rules-steering — architectural constraints / coding conventions as Markdown the agent consults automatically. First AWS source pinning .kiro/steering/*.md + Markdown formatKiro's concrete surface.
  • concepts/machine-readable-documentation — AGENT.md / RUNBOOK.md / CONTRIBUTING.md + YAML-over-prose as the broader design principle; project rules as one realization.
  • patterns/local-emulation-first — prefer local emulator over cloud deployment as the default feedback path; canonical four realizations (SAM / containers / DynamoDB Local / Glue Docker).
  • patterns/hybrid-cloud-testing — for services without local emulators (SNS / SQS named), define minimal CFN / CDK stacks and invoke via SDK. Cloud is "another test dependency — used sparingly and predictably".
  • patterns/ephemeral-preview-environments — short-lived IaC- defined stacks, deployed on demand, torn down after E2E validation. The above-hybrid-cloud tier.
  • patterns/layered-testing-strategy — unit (domain, fast) / contract (interfaces) / smoke (deployed env). Each tier catches a distinct failure class.
  • patterns/tests-as-executable-specifications — tests do more than catch regressions — they define acceptable behavior; a failing test teaches the agent what's expected. Sibling of patterns/executable-specification at the test-suite tier.
  • patterns/ci-cd-agent-guardrails — required tests + automated reviews + branch protections + preview-env validation + human gates for high-impact changes; expand agent autonomy as confidence compounds.

Recent articles

Most recent first; ingested AWS blog posts, not republications from companies/allthingsdistributed.

Last updated · 200 distilled / 1,178 read