AWS 2025-12-11

Architecting conversational observability for cloud applications¶

AWS Architecture Blog reference-architecture post (2025-12-11) for a generative-AI-powered Kubernetes troubleshooting assistant. Companion piece to the later 2026-03-18 AWS DevOps Agent product launch — this one is a build-it-yourself blueprint with two alternate deployment topologies and a reference implementation in aws-samples/sample-eks-troubleshooting-rag-chatbot, where the 2026-03-18 post is the AWS-managed-service shape of the same idea. The two posts together pin the canonical wiki shape of AI-augmented observability on EKS: same problem, same signals, two vendor relationships.

Summary¶

Modern microservice applications on EKS / ECS / Lambda scatter telemetry across layers; engineers stitch logs, events, metrics, and live cluster state together manually during incidents, which drives up MTTR. The post builds a chatbot assistant that combines historical telemetry retrieved via vector search with real-time cluster state via allowlisted read-only kubectl, and iterates in a LLM ↔ cluster loop until it has enough context to propose a resolution. Two architectures are presented: (1) a RAG-based chatbot (Fluent Bit → Kinesis Data Streams → Lambda calling Titan Embeddings v2 → vectors in OpenSearch Serverless → Gradio web chatbot → troubleshooting assistant in the cluster), and (2) a Strands Agents SDK multi-agent system (Agent Orchestrator + Memory Agent + K8s Specialist; 1024-dim embeddings in S3 Vectors; EKS MCP Server exposing K8s operations as MCP tools; Slack bot as the UI; Pod Identity for AWS service access).

Key takeaways¶

Two observability-layer stitching problems are collapsed into one interface. A Kubernetes operator traditionally runs kubectl describe, kubectl logs, kubectl get events, and cross- references grafana dashboards + application logs in parallel terminals. The chatbot collapses both multi-layer telemetry correlation and real-time state queries behind one natural- language surface, bounded by what the allowlist permits. MTTR framing is the core business case: per the 2024 Observability Pulse Report cited in the post, 48% of organizations say lack of team knowledge is their biggest observability challenge and 82% say production-issue resolution takes >1h. (Source: sources/2025-12-11-aws-architecting-conversational-observability-for-cloud-applications)
Telemetry-to-RAG is a distinct pipeline shape, not a generic ingestion pipeline. The diagrammed pipeline is Fluent Bit → Kinesis Data Streams → Lambda → Bedrock (Titan Embeddings v2) → OpenSearch Serverless. Every layer is deliberate: Fluent Bit streams telemetry from pods with low overhead; Kinesis batches + smooths spikes; Lambda does normalization and embedding in batches for cost (explicit "Pro tip" in the post); OpenSearch Serverless removes cluster sizing from the critical path. The RAG retrieval step at query time is embed(user_query) → k-NN lookup in OpenSearch → prompt augmentation — the same shape any retrieval-augmented chatbot uses, now over operational telemetry instead of product docs.
Real-time cluster state is the second context input, not a replacement for stored telemetry. The architecture deliberately combines historical telemetry (embedded in OpenSearch from the ingest pipeline) with live kubectl output from an in-cluster troubleshooting assistant. "This cycle gradually builds a richer picture of the issue by combining historical telemetry with real- time cluster state to speed up root cause analysis." Either signal alone is insufficient — the value is the fusion. Same insight as AWS DevOps Agent's K8s-API + OTel fusion but achieved with live kubectl as the real-time path.
Allowlisted read-only kubectl is the agent-safety primitive. The troubleshooting assistant in the cluster "executes [kubectl commands] with a service account that has read-only permissions, following the principle of least privilege". Only kubectl operations on a static allowlist are permitted; the assistant cannot apply, patch, delete, or exec. This is the canonical allowlisted-read-only-agent-actions pattern — constraining the side-effect surface of an LLM-driven agent to a vetted set of verbs, with RBAC enforcement at the Kubernetes API server as a second line of defense (concepts/defense-in-depth). Note this is static allowlisting — the list is code, not LLM-chosen.
RAG vs agentic-MCP is the current-state design fork. The post ships two deployments controlled by a single Terraform deployment_type variable: (1) classic RAG-based chatbot (default) where retrieval and kubectl execution are orchestrated by the chatbot app itself, and (2) Strands Agents SDK multi- agent system where Agent Orchestrator / Memory Agent / K8s Specialist each own a narrow scope (patterns/specialized-agent-decomposition), vectors live in S3 Vectors as 1024-dim embeddings for cost, and K8s operations are exposed via EKS MCP Server using the MCP protocol. The agentic shape replaces custom orchestration code with a standardized tool-call protocol, and replaces OpenSearch Serverless's RAM-heavy cost profile with S3's cold-tier vector storage. No quantitative comparison is offered.
Iterative investigation, not single-shot prompting. The illustrated end-to-end flow is explicitly a loop: query → retrieve telemetry → LLM proposes kubectl commands → assistant runs them → output re-fed to LLM → LLM decides continue or conclude → (optional more rounds) → final resolution. This is the agentic troubleshooting loop primitive — LLM is the planner, the allowlist-constrained assistant is the hands, OpenSearch + live cluster are the eyes, and the stopping criterion is "enough context" (LLM-judged).
Security discipline carried through the reference architecture. Named tactics: (a) strict kubectl allowlist + RBAC (read-only for pods/services/events/logs in specific namespaces); (b) sanitize application logs before embedding to prevent PII / secrets leaking into vectors; (c) KMS encryption for Kinesis in-transit and OpenSearch at-rest; (d) private subnets
VPC endpoints per Well-Architected Security Pillar; (e) validate user inputs against prompt injection. The log-sanitization rule is notable: "sanitizing application logs before embedding generation to help prevent sensitive information exposure" — embeddings are derived artifacts and inherit the governance posture of their source data.
Per-service compute-layer generality is asserted but not shown. Post explicitly claims the approach extends to ECS and Lambda: "a similar approach can be extended to other compute services like Amazon ECS or AWS Lambda". The telemetry shape (logs + events + metrics) is universal; the in-cluster troubleshooting assistant would be replaced by a service-specific executor (aws ecs describe-*, aws logs filter-log-events, CloudWatch Logs Insights). Only EKS is demonstrated.

Systems introduced¶

systems/strands-agents-sdk — open-source Python SDK for building agentic systems on AWS; multi-agent orchestration, tool calling, session management. Used in the post to build a three- agent system (Orchestrator / Memory / K8s Specialist).
systems/eks-mcp-server — AWS-Labs-published Model Context Protocol server exposing Kubernetes / EKS operations as MCP tools. The agent-native interface to a cluster; replaces hand-rolled kubectl wrappers.
systems/fluent-bit — CNCF telemetry processor and forwarder; lightweight in-pod or DaemonSet deployment collecting application logs, kubelet logs, and Kubernetes events. The canonical Kubernetes ingestion point feeding the RAG pipeline.
systems/amazon-kinesis-data-streams — managed durable streaming substrate; the buffer between Fluent Bit's firehose and Lambda's embedding work; enables batching for cost.

Systems extended¶

systems/aws-eks — the investigation target. The troubleshooting-assistant container runs as a pod with a read-only service account.
systems/amazon-bedrock — hosts Titan Embeddings v2 for the RAG path and the LLM (unspecified in the post) for reasoning / kubectl-command-generation / final-answer synthesis.
systems/amazon-opensearch-service — OpenSearch Serverless as the RAM-backed vector store for the RAG deployment; k-NN plugin serves retrieval at query time.
systems/s3-vectors — cold-tier vector store alternative in the Strands deployment; 1024-dimensional embeddings; cost- optimized vs OpenSearch Serverless's in-memory model.
systems/amazon-titan-embeddings — specific model named as amazon.titan-embed-text-v2:0.
systems/aws-lambda — telemetry-normalization + embedding- generation compute in the RAG pipeline; batching explicitly recommended.
systems/model-context-protocol — the agent ↔ tool protocol used by Strands + EKS MCP Server in deployment option 2.
systems/aws-kms — encryption at rest (OpenSearch) and in transit (Kinesis).

Concepts introduced¶

concepts/agentic-troubleshooting-loop — iterative LLM ↔ tool-assistant investigation cycle; LLM proposes queries, tool assistant executes, output re-enters LLM context, repeats until the LLM judges enough context for resolution.

Concepts extended¶

concepts/observability — MTTR-centric view; conversational surface above the triad as an emerging layer, now with a self-built-vs-AWS-managed fork (this post's RAG/Strands blueprint vs AWS DevOps Agent).
concepts/vector-similarity-search / concepts/vector-embedding — new domain: operational telemetry as embedding corpus rather than product documentation.
concepts/least-privileged-access — K8s-RBAC-enforced read-only service account for the troubleshooting assistant.
concepts/defense-in-depth — two-layer control (static kubectl allowlist in the assistant code + RBAC at the API server).
concepts/blast-radius — the static allowlist narrows it to read-only failure modes.

Patterns introduced¶

patterns/allowlisted-read-only-agent-actions — constrain an LLM-driven agent's side effects to a static allowlist of safe verbs (kubectl get/describe/logs/events), enforced at both application layer and platform RBAC. Generalizes across compute fabrics (ECS describe-* / Lambda get-function-* / any platform-side read-only API surface).
patterns/telemetry-to-rag-pipeline — streaming telemetry into a vector store for LLM augmentation: Fluent Bit → Kinesis → Lambda+Bedrock embeddings → OpenSearch / S3 Vectors; log sanitization before embedding; batch at the Lambda layer for cost; allow hot (OpenSearch Serverless) / cold (S3 Vectors) tiering choice.

Patterns extended¶

patterns/specialized-agent-decomposition — Strands deployment's three-agent split (Orchestrator / Memory / K8s Specialist) exemplifies decomposing agentic responsibility into narrow tool-surface scopes; same shape as Databricks Storex agents and Cloudflare's Agent Lee team.

Architecture diagrams referenced¶

The post includes four figures (inline ![] CloudFront PNGs from the original post — not captured in the raw markdown):

Figure 1: multitude of telemetry sources (kubelet logs, app logs, events, metrics) in a cluster.
Figure 2: telemetry ingestion — Fluent Bit → Kinesis → Lambda → Bedrock embeddings → OpenSearch Serverless.
Figure 3: chatbot retrieval + augmentation flow — user query → vector search → augmented prompt → LLM → kubectl command generation.
Figure 4: iterative troubleshooting loop — LLM ↔ assistant cycle with a conclude / continue decision.

Operational numbers / scale cited¶

Item	Value	Source
Team-knowledge challenge	48% of orgs	2024 Observability Pulse Report
Production-issue resolution >1 hour	82% of teams	2024 Observability Pulse Report
S3 Vectors embedding dimensionality	1024-dim	Strands deployment
Embedding model	`amazon.titan-embed-text-v2:0`	RAG deployment
Strands agents	3 (Orchestrator, Memory, K8s Specialist)	Post

Not disclosed: post-deployment MTTR reduction, cost per query token, OpenSearch-Serverless capacity (OCUs), Kinesis shards, Lambda concurrency, query latency (retrieval + generation), embedding-job throughput, production cluster sizes, eval or accuracy metrics, prompt-injection guardrail implementation, specific LLM model used for reasoning, kubectl command allowlist contents, RBAC role definitions.

Caveats¶

Reference-architecture post, not a production retrospective. No customer-facing deployment is described; architecture is demonstrated via the sample GitHub repo + two re:Invent / KubeCon talks cited at the end. Marketing-leaning in tone around deployment_type flexibility.
No evaluation data — no accuracy numbers, no MTTR delta, no user studies, no prompt-injection-resistance testing cited, no hallucination-rate discussion, no guardrails specifics.
Two architectures presented in parallel rather than compared — the reader is not told when to prefer RAG vs Strands, or what the cost/latency/quality trade-offs are beyond "S3 Vectors is cost-optimized".
Compute-fabric generality is asserted, not shown. ECS and Lambda are mentioned as extending naturally; no examples or pipeline variations are given.
LLM model for reasoning is unspecified — Titan Embeddings v2 is named for embeddings but the reasoning model (Claude / Titan / Llama / etc) is left open, which is a surprising omission for a reference architecture.
No failure-mode discussion for the iterative loop itself — what happens if the LLM enters a query-loop it can't terminate? What's the max-iteration cutoff? How are contradictory signals (stored telemetry says A, live kubectl says B) reconciled?
Sits one notch below the 2026-03-18 AWS DevOps Agent post architecturally — that one explicitly names the two-path K8s discovery methodology (concepts/telemetry-based-resource-discovery), the baseline-learning step, and the confidence-scored RCA ranking. This earlier post stops at "loop until enough context".

Source¶

systems/aws-devops-agent — AWS-managed-service peer to this blueprint; the self-build → AWS-managed-service evolution over ~3 months (this post 2025-12-11, DevOps Agent 2026-03-18).
systems/bits-ai-sre — Datadog's SaaS peer to AWS DevOps Agent, a step further removed from this self-build blueprint.
concepts/observability — parent concept where this pattern plugs in under the "agent-assisted debugging layer" section.
concepts/telemetry-based-resource-discovery — the more advanced discovery methodology that the AWS DevOps Agent post introduces on top of this post's simpler "live kubectl + stored telemetry" shape.
patterns/specialized-agent-decomposition — Strands deployment's three-agent split.
sources/2026-03-18-aws-ai-powered-event-response-for-amazon-eks — the AWS-managed-service successor post.
sources/2025-07-16-aws-amazon-s3-vectors-preview-launch — introduces S3 Vectors, the cold vector store used in the Strands deployment.