Skip to content

AWS 2026-03-26 Tier 1

Read original ↗

Architecting for agentic AI development on AWS

Summary

AWS Architecture Blog prescriptive essay on how to architect AWS systems so AI coding agents can operate effectively. Thesis: most cloud architectures were designed for human-driven development (long-lived environments, manual testing, infrequent deployments); those assumptions break when an AI agent is the primary author, because the agent's effectiveness is gated on feedback loop speed. The post prescribes two co-equal axes of architectural change: (1) system architecture for fast agentic feedback (local emulation as the default feedback path; offline development for data workloads; hybrid testing with lightweight cloud resources; preview environments + contract-first design), and (2) codebase architecture for AI-friendly development (domain-driven structure with explicit boundaries; project rules / steering files encoding architectural intent; tests as executable specifications; monorepos + machine-readable documentation; CI/CD guardrails that scale agent autonomy over time). Names systems/kiro as the concrete surface for the "project rules" / steering-files idea. Prescriptive — no production numbers, no customer case, no retrospective.

Key takeaways

  1. Agentic development is gated on feedback-loop speed, not prompt quality. "AI agents must validate changes continuously. When every test requires provisioning cloud resources, waiting for pipelines, or debugging deployment-only failures, feedback loops become too slow... Without architectural support, agentic AI produces more risk than value. The solution is not better prompts, it's an architecture that treats fast feedback and clear boundaries as first-class concerns." The post explicitly reframes "better AI for coding" as an architectural problem, not a model problem.
  2. Local emulation is the default feedback path. AWS serverless apps (Lambda + API Gateway) emulated locally via AWS SAM (sam local start-api); containers (ECS / Fargate) validated by running the same image locally; DynamoDB tested against DynamoDB Local which mirrors the DynamoDB API; AWS Glue jobs run locally via Docker images shipping the Glue ETL libraries. "An AI agent can invoke Lambda functions through a locally emulated API Gateway, observe responses immediately, and iterate in seconds rather than minutes." (patterns/local-emulation-first)
  3. Offline development for data/ML workloads shortens the same loop. Where the workload doesn't fit request-response testing (data pipelines, ML training), the pattern still holds: "isolate logic, test locally with reduced data, and promote validated code to managed services later." Glue's Docker images are the named example.
  4. Hybrid testing keeps cloud feedback lightweight when emulation isn't possible. For services with no local emulator — named examples SNS and SQS"define minimal development stacks using IaC tools such as AWS CloudFormation or the AWS CDK. An AI agent can deploy small, isolated resources, invoke them through the AWS SDK, and validate behavior without provisioning full environments." The cloud is framed as "another test dependency — used sparingly and predictably" (patterns/hybrid-cloud-testing).
  5. Preview environments + contract-first design. End-to-end validation uses short-lived stacks deployed on demand, defined through IaC, created by the agent, torn down after validation (patterns/ephemeral-preview-environments). Paired with contract-first design where APIs are defined upfront as OpenAPI specifications so "agents can validate integrations even before all services are implemented."
  6. Domain-driven structure is the codebase precondition for agents. Organize into predictable layers — /domain, /application, /infrastructure. "The domain layer contains business rules with no Amazon dependencies. Infrastructure code handles integrations with services such as Amazon DynamoDB or Amazon SNS. This separation allows AI agents to modify business logic and validate it locally without touching cloud-specific code." Hexagonal-architecture framing: "treats external systems as adapters rather than dependencies." (concepts/hexagonal-architecture, DDD layering).
  7. Project rules / steering files encode architectural intent for the agent. "Kiro supports steering files — Markdown files stored under .kiro/steering/ — that describe architectural constraints and coding conventions. For example, a rule might state that database access must go through repository classes in the infrastructure layer. The agent consults these rules automatically, reducing the need to restate constraints in every prompt and helping to keep generated code aligned with your architecture." (concepts/project-rules-steering). First AWS source to concretise the .kiro/steering/ path + Markdown format.
  8. Tests-as-executable-specifications is a three-layer stack. Unit (domain logic, fast, frequent AI iterations) + contract (interfaces between services, catch breaking changes early) + smoke (against deployed env, surface IAM/config issues that only appear at runtime). "When a test fails, the agent can infer what behavior is expected and refine its changes accordingly." (patterns/layered-testing-strategy, patterns/tests-as-executable-specifications).
  9. Monorepos + machine-readable docs = broader agent context. "A monorepo allows the agent to navigate across services, understand shared patterns, and evaluate the impact of changes system-wide. Within that repository, concise and structured documentation is essential. Files such as AGENT.md can explain architectural principles and constraints, while RUNBOOK.md and CONTRIBUTING.md describe operational and development workflows. Machine-readable formats, such as YAML or configuration files, are more straightforward for agents to interpret than lengthy prose." Kiro's foundational steering documents are offered as the concrete surface (concepts/machine-readable-documentation).
  10. CI/CD guardrails scale agent autonomy over time. "CI/CD pipelines should include guardrails such as required test execution, automated reviews, and branch protections. Over time, as confidence grows, you can expand the agent's autonomy while keeping humans in the loop for high-impact decisions." (patterns/ci-cd-agent-guardrails).

The two architectural axes

System architecture (for fast feedback)

Path Latency Scope Tool
Local emulation (default) seconds single service or handler SAM sam local start-api; local container run; systems/dynamodb-local; Glue Docker images
Offline dev (data/ML) seconds single transformation systems/aws-glue ETL libs in Docker
Hybrid cloud (no emulator) minutes minimal dev stacks CloudFormation / CDK small stacks; SDK invocation
Preview environment minutes whole app, short-lived IaC ephemeral stacks, torn down after validation

The point of the table is not that any one of these replaces the others — it's that each unvalidated change should use the cheapest tier that can falsify it, so the agent's iteration budget isn't spent waiting on environments it didn't need.

Codebase architecture (for AI-friendly change)

  1. Explicit layer boundaries/domain (no Amazon deps), /application (orchestration), /infrastructure (integrations). Hexagonal-architecture framing: external systems are adapters, not dependencies.
  2. Project rules.kiro/steering/ Markdown files describe constraints (e.g. "database access must go through repository classes in the infrastructure layer"); agent consults them automatically.
  3. Layered tests — unit / contract / smoke, each at a different cost-latency tier; failing tests double as behavioural spec.
  4. Monorepo + machine-readable docsAGENT.md / RUNBOOK.md / CONTRIBUTING.md + YAML/config files; "more straightforward for agents to interpret than lengthy prose."
  5. CI/CD guardrails — required tests, automated reviews, branch protections; expand agent autonomy as confidence compounds.

Systems introduced / extended

  • systems/aws-sam (new) — AWS Serverless Application Model; local-emulation surface for Lambda + API Gateway; sam local start-api is the canonical AI-agent feedback-loop entry point.
  • systems/dynamodb-local (new) — downloadable DynamoDB emulator mirroring the DynamoDB API for local CRUD testing.
  • systems/aws-fargate (new) — serverless container runtime named as container-workload peer to ECS for local-emulation-first iteration (same container image runs locally and in Fargate).
  • systems/aws-lambda — named as the archetypal local-emulation target via SAM.
  • systems/amazon-api-gateway — the front of the sam local start-api loop.
  • systems/amazon-ecs — container substrate for the "build and run the same image locally" discipline.
  • systems/aws-glue — named as the data-workload instance of local-emulation-first (Docker images ship the Glue ETL libraries).
  • systems/aws-sns, systems/aws-sqs — named as the canonical "no local emulator" services that drive the hybrid-cloud-testing pattern.
  • systems/aws-cloudformation, systems/aws-cdk — the IaC substrate under both hybrid dev stacks and preview environments.
  • systems/aws-iam — named as an example of runtime-only configuration (missing IAM permissions) that smoke tests exist to surface.
  • systems/kiro — the concrete product surface for project rules / steering files + foundational steering documents. First wiki reference that pins the .kiro/steering/ directory + Markdown format.

Concepts introduced / extended

  • concepts/agentic-development (new) — development model where an AI agent "does more than suggest snippets — it writes, tests, deploys, and refines code through rapid feedback cycles." Distinguished from AI-assisted code completion.
  • concepts/fast-feedback-loops (new) — re-framed as the primary architectural constraint of agentic development.
  • concepts/local-emulation (new) — umbrella concept covering SAM local, container local run, DynamoDB Local, Glue Docker images.
  • concepts/contract-first-design (new) — APIs defined upfront via OpenAPI specs; agents can validate integrations before all services are implemented.
  • concepts/hexagonal-architecture (new) — the ports-and-adapters separation where external systems are adapters, not dependencies; the codebase shape that makes the domain layer testable without touching cloud services.
  • concepts/project-rules-steering (new) — architectural constraints / coding conventions written down in Markdown files the agent reads automatically (.kiro/steering/).
  • concepts/machine-readable-documentation (new) — design principle: "YAML or configuration files are more straightforward for agents to interpret than lengthy prose"; AGENT.md / RUNBOOK.md / CONTRIBUTING.md framing.
  • concepts/monorepo — extended: the post names monorepos as the enabler of broad agent context and system-wide impact evaluation.
  • concepts/specification-driven-development — extended: project rules / steering files are the codebase-hosted corner of the spec-driven workflow; same substrate as the Kiro / Bedrock automated reasoning stack but on the developer-process side.

Patterns introduced / extended

  • patterns/local-emulation-first (new) — prefer local emulation over cloud deployment as the default feedback path for AI agents.
  • patterns/ephemeral-preview-environments (new) — short-lived IaC-defined stacks, deployed on demand, torn down after validation; the end-to-end tier above hybrid cloud testing.
  • patterns/hybrid-cloud-testing (new) — for services without local emulators (SNS, SQS named), define minimal dev stacks via CloudFormation/CDK, invoke through SDK, tear down. Cloud as "another test dependency — used sparingly and predictably."
  • patterns/layered-testing-strategy (new) — unit (domain, fast) / contract (interfaces, early break detection) / smoke (deployed env, config + IAM surface). Each tier a different cost-latency trade-off.
  • patterns/tests-as-executable-specifications (new) — tests double as behavioural spec the agent can read; "when a test fails, the agent can infer what behavior is expected and refine its changes accordingly." Sibling of patterns/executable-specification at the test-suite-not-formal-spec tier.
  • patterns/ci-cd-agent-guardrails (new) — scale agent autonomy over time via required tests + automated reviews + branch protections, keeping humans in the loop for high-impact decisions.

Caveats

  • Prescriptive essay, not a retrospective. No production numbers, no customer case study, no quantified productivity / defect / cycle time impact. Claims are shape-of-architecture claims, not measured outcomes.
  • No agent named. The post doesn't name which AI agent(s) it's architecting for — Kiro is mentioned as the surface for steering files, but the prescriptions are agent-agnostic.
  • No scaling story. The post addresses the single-developer, single-agent loop; the fleet-scale version (many agents operating on a monorepo concurrently, coordinating their ephemeral preview environments, competing for shared dev stacks) is out of scope. The implicit assumption is one-agent-at-a-time per developer.
  • Cost not discussed. Ephemeral preview environments + hybrid dev stacks are both cloud spend; the post doesn't address how the agent's iteration-hungry loop interacts with the cloud bill.
  • Steering-file syntax underspecified. "Markdown files stored under .kiro/steering/" is as concrete as the post gets. No schema, no rule precedence, no conflict resolution, no documented interaction with base prompts.
  • No failure-mode taxonomy. How the agent recovers from local emulation drift (DynamoDB Local vs real DynamoDB behavioural differences; SAM local vs real Lambda runtime; Glue Docker vs Glue runtime) is not addressed. Real systems have these gaps — see systems/dropbox-nucleus's Rust backend mock → Heirloom suite at ~100× slowdown for the canonical articulation.
  • No positioning vs existing AWS guidance. The post doesn't reference AWS Well-Architected, the prescriptive guidance library, or prior Lambda / serverless best-practices posts, nor does it explain how this essay lands next to them.
  • No explicit agentic-specific risk framing. Mentions "governance" and "humans in the loop for high-impact decisions" but doesn't enumerate agent-specific failure modes (hallucinated API calls, runaway iteration, unintended cross-service blast radius) the way a retrospective would.

Source

Last updated · 200 distilled / 1,178 read