Architecting conversational observability for cloud applications¶
AWS Architecture Blog reference-architecture post (2025-12-11) for a generative-AI-powered Kubernetes troubleshooting assistant. Companion piece to the later 2026-03-18 AWS DevOps Agent product launch — this one is a build-it-yourself blueprint with two alternate deployment topologies and a reference implementation in aws-samples/sample-eks-troubleshooting-rag-chatbot, where the 2026-03-18 post is the AWS-managed-service shape of the same idea. The two posts together pin the canonical wiki shape of AI-augmented observability on EKS: same problem, same signals, two vendor relationships.
Summary¶
Modern microservice applications on EKS /
ECS / Lambda scatter
telemetry across layers; engineers stitch logs, events, metrics, and
live cluster state together manually during incidents, which drives
up MTTR. The post builds a chatbot assistant that combines
historical telemetry retrieved via
vector search with real-time cluster state via allowlisted
read-only kubectl, and iterates in a
LLM ↔ cluster loop until
it has enough context to propose a resolution. Two architectures are
presented: (1) a RAG-based chatbot (Fluent
Bit → Kinesis Data Streams
→ Lambda calling
Titan Embeddings v2 → vectors in
OpenSearch Serverless → Gradio
web chatbot → troubleshooting assistant in the cluster), and (2) a
Strands Agents SDK multi-agent
system (Agent Orchestrator + Memory Agent + K8s Specialist; 1024-dim
embeddings in S3 Vectors;
EKS MCP Server exposing K8s operations as
MCP tools; Slack bot as the UI; Pod Identity for AWS service access).
Key takeaways¶
- Two observability-layer stitching problems are collapsed into
one interface. A Kubernetes operator traditionally runs
kubectl describe,kubectl logs,kubectl get events, and cross- references grafana dashboards + application logs in parallel terminals. The chatbot collapses both multi-layer telemetry correlation and real-time state queries behind one natural- language surface, bounded by what the allowlist permits. MTTR framing is the core business case: per the 2024 Observability Pulse Report cited in the post, 48% of organizations say lack of team knowledge is their biggest observability challenge and 82% say production-issue resolution takes >1h. (Source: sources/2025-12-11-aws-architecting-conversational-observability-for-cloud-applications) - Telemetry-to-RAG is a distinct pipeline shape, not a generic
ingestion pipeline. The diagrammed pipeline is
Fluent Bit → Kinesis Data
Streams → Lambda → Bedrock (Titan Embeddings v2) → OpenSearch
Serverless. Every layer is deliberate: Fluent Bit streams
telemetry from pods with low overhead; Kinesis batches + smooths
spikes; Lambda does normalization and embedding in batches for
cost (explicit "Pro tip" in the post); OpenSearch Serverless
removes cluster sizing from the critical path. The RAG retrieval
step at query time is
embed(user_query)→k-NN lookup in OpenSearch→ prompt augmentation — the same shape any retrieval-augmented chatbot uses, now over operational telemetry instead of product docs. - Real-time cluster state is the second context input, not a
replacement for stored telemetry. The architecture deliberately
combines historical telemetry (embedded in OpenSearch from the
ingest pipeline) with live
kubectloutput from an in-cluster troubleshooting assistant. "This cycle gradually builds a richer picture of the issue by combining historical telemetry with real- time cluster state to speed up root cause analysis." Either signal alone is insufficient — the value is the fusion. Same insight as AWS DevOps Agent's K8s-API + OTel fusion but achieved with livekubectlas the real-time path. - Allowlisted read-only
kubectlis the agent-safety primitive. The troubleshooting assistant in the cluster "executes [kubectl commands] with a service account that has read-only permissions, following the principle of least privilege". Only kubectl operations on a static allowlist are permitted; the assistant cannot apply, patch, delete, or exec. This is the canonical allowlisted-read-only-agent-actions pattern — constraining the side-effect surface of an LLM-driven agent to a vetted set of verbs, with RBAC enforcement at the Kubernetes API server as a second line of defense (concepts/defense-in-depth). Note this is static allowlisting — the list is code, not LLM-chosen. - RAG vs agentic-MCP is the current-state design fork. The post
ships two deployments controlled by a single Terraform
deployment_typevariable: (1) classic RAG-based chatbot (default) where retrieval and kubectl execution are orchestrated by the chatbot app itself, and (2) Strands Agents SDK multi- agent system where Agent Orchestrator / Memory Agent / K8s Specialist each own a narrow scope (patterns/specialized-agent-decomposition), vectors live in S3 Vectors as 1024-dim embeddings for cost, and K8s operations are exposed via EKS MCP Server using the MCP protocol. The agentic shape replaces custom orchestration code with a standardized tool-call protocol, and replaces OpenSearch Serverless's RAM-heavy cost profile with S3's cold-tier vector storage. No quantitative comparison is offered. - Iterative investigation, not single-shot prompting. The illustrated end-to-end flow is explicitly a loop: query → retrieve telemetry → LLM proposes kubectl commands → assistant runs them → output re-fed to LLM → LLM decides continue or conclude → (optional more rounds) → final resolution. This is the agentic troubleshooting loop primitive — LLM is the planner, the allowlist-constrained assistant is the hands, OpenSearch + live cluster are the eyes, and the stopping criterion is "enough context" (LLM-judged).
- Security discipline carried through the reference architecture. Named tactics: (a) strict kubectl allowlist + RBAC (read-only for pods/services/events/logs in specific namespaces); (b) sanitize application logs before embedding to prevent PII / secrets leaking into vectors; (c) KMS encryption for Kinesis in-transit and OpenSearch at-rest; (d) private subnets
- VPC endpoints per Well-Architected Security Pillar; (e) validate user inputs against prompt injection. The log-sanitization rule is notable: "sanitizing application logs before embedding generation to help prevent sensitive information exposure" — embeddings are derived artifacts and inherit the governance posture of their source data.
- Per-service compute-layer generality is asserted but not
shown. Post explicitly claims the approach extends to ECS and
Lambda: "a similar approach can be extended to other compute
services like Amazon ECS or AWS Lambda". The telemetry shape
(logs + events + metrics) is universal; the in-cluster
troubleshooting assistant would be replaced by a service-specific
executor (
aws ecs describe-*,aws logs filter-log-events, CloudWatch Logs Insights). Only EKS is demonstrated.
Systems introduced¶
- systems/strands-agents-sdk — open-source Python SDK for building agentic systems on AWS; multi-agent orchestration, tool calling, session management. Used in the post to build a three- agent system (Orchestrator / Memory / K8s Specialist).
- systems/eks-mcp-server — AWS-Labs-published Model Context Protocol server exposing Kubernetes / EKS operations as MCP tools. The agent-native interface to a cluster; replaces hand-rolled kubectl wrappers.
- systems/fluent-bit — CNCF telemetry processor and forwarder; lightweight in-pod or DaemonSet deployment collecting application logs, kubelet logs, and Kubernetes events. The canonical Kubernetes ingestion point feeding the RAG pipeline.
- systems/amazon-kinesis-data-streams — managed durable streaming substrate; the buffer between Fluent Bit's firehose and Lambda's embedding work; enables batching for cost.
Systems extended¶
- systems/aws-eks — the investigation target. The troubleshooting-assistant container runs as a pod with a read-only service account.
- systems/amazon-bedrock — hosts Titan Embeddings v2 for the RAG path and the LLM (unspecified in the post) for reasoning / kubectl-command-generation / final-answer synthesis.
- systems/amazon-opensearch-service — OpenSearch Serverless as the RAM-backed vector store for the RAG deployment; k-NN plugin serves retrieval at query time.
- systems/s3-vectors — cold-tier vector store alternative in the Strands deployment; 1024-dimensional embeddings; cost- optimized vs OpenSearch Serverless's in-memory model.
- systems/amazon-titan-embeddings — specific model named as
amazon.titan-embed-text-v2:0. - systems/aws-lambda — telemetry-normalization + embedding- generation compute in the RAG pipeline; batching explicitly recommended.
- systems/model-context-protocol — the agent ↔ tool protocol used by Strands + EKS MCP Server in deployment option 2.
- systems/aws-kms — encryption at rest (OpenSearch) and in transit (Kinesis).
Concepts introduced¶
- concepts/agentic-troubleshooting-loop — iterative LLM ↔ tool-assistant investigation cycle; LLM proposes queries, tool assistant executes, output re-enters LLM context, repeats until the LLM judges enough context for resolution.
Concepts extended¶
- concepts/observability — MTTR-centric view; conversational surface above the triad as an emerging layer, now with a self-built-vs-AWS-managed fork (this post's RAG/Strands blueprint vs AWS DevOps Agent).
- concepts/vector-similarity-search / concepts/vector-embedding — new domain: operational telemetry as embedding corpus rather than product documentation.
- concepts/least-privileged-access — K8s-RBAC-enforced read-only service account for the troubleshooting assistant.
- concepts/defense-in-depth — two-layer control (static kubectl allowlist in the assistant code + RBAC at the API server).
- concepts/blast-radius — the static allowlist narrows it to read-only failure modes.
Patterns introduced¶
- patterns/allowlisted-read-only-agent-actions — constrain
an LLM-driven agent's side effects to a static allowlist of safe
verbs (kubectl get/describe/logs/events), enforced at both
application layer and platform RBAC. Generalizes across
compute fabrics (ECS
describe-*/ Lambdaget-function-*/ any platform-side read-only API surface). - patterns/telemetry-to-rag-pipeline — streaming telemetry into a vector store for LLM augmentation: Fluent Bit → Kinesis → Lambda+Bedrock embeddings → OpenSearch / S3 Vectors; log sanitization before embedding; batch at the Lambda layer for cost; allow hot (OpenSearch Serverless) / cold (S3 Vectors) tiering choice.
Patterns extended¶
- patterns/specialized-agent-decomposition — Strands deployment's three-agent split (Orchestrator / Memory / K8s Specialist) exemplifies decomposing agentic responsibility into narrow tool-surface scopes; same shape as Databricks Storex agents and Cloudflare's Agent Lee team.
Architecture diagrams referenced¶
The post includes four figures (inline ![] CloudFront PNGs from the
original post — not captured in the raw markdown):
- Figure 1: multitude of telemetry sources (kubelet logs, app logs, events, metrics) in a cluster.
- Figure 2: telemetry ingestion — Fluent Bit → Kinesis → Lambda → Bedrock embeddings → OpenSearch Serverless.
- Figure 3: chatbot retrieval + augmentation flow — user query → vector search → augmented prompt → LLM → kubectl command generation.
- Figure 4: iterative troubleshooting loop — LLM ↔ assistant cycle with a conclude / continue decision.
Operational numbers / scale cited¶
| Item | Value | Source |
|---|---|---|
| Team-knowledge challenge | 48% of orgs | 2024 Observability Pulse Report |
| Production-issue resolution >1 hour | 82% of teams | 2024 Observability Pulse Report |
| S3 Vectors embedding dimensionality | 1024-dim | Strands deployment |
| Embedding model | amazon.titan-embed-text-v2:0 |
RAG deployment |
| Strands agents | 3 (Orchestrator, Memory, K8s Specialist) | Post |
Not disclosed: post-deployment MTTR reduction, cost per query token, OpenSearch-Serverless capacity (OCUs), Kinesis shards, Lambda concurrency, query latency (retrieval + generation), embedding-job throughput, production cluster sizes, eval or accuracy metrics, prompt-injection guardrail implementation, specific LLM model used for reasoning, kubectl command allowlist contents, RBAC role definitions.
Caveats¶
- Reference-architecture post, not a production retrospective. No customer-facing deployment is described; architecture is demonstrated via the sample GitHub repo + two re:Invent / KubeCon talks cited at the end. Marketing-leaning in tone around deployment_type flexibility.
- No evaluation data — no accuracy numbers, no MTTR delta, no user studies, no prompt-injection-resistance testing cited, no hallucination-rate discussion, no guardrails specifics.
- Two architectures presented in parallel rather than compared — the reader is not told when to prefer RAG vs Strands, or what the cost/latency/quality trade-offs are beyond "S3 Vectors is cost-optimized".
- Compute-fabric generality is asserted, not shown. ECS and Lambda are mentioned as extending naturally; no examples or pipeline variations are given.
- LLM model for reasoning is unspecified — Titan Embeddings v2 is named for embeddings but the reasoning model (Claude / Titan / Llama / etc) is left open, which is a surprising omission for a reference architecture.
- No failure-mode discussion for the iterative loop itself — what happens if the LLM enters a query-loop it can't terminate? What's the max-iteration cutoff? How are contradictory signals (stored telemetry says A, live kubectl says B) reconciled?
- Sits one notch below the 2026-03-18 AWS DevOps Agent post architecturally — that one explicitly names the two-path K8s discovery methodology (concepts/telemetry-based-resource-discovery), the baseline-learning step, and the confidence-scored RCA ranking. This earlier post stops at "loop until enough context".
Source¶
- Original: https://aws.amazon.com/blogs/architecture/architecting-conversational-observability-for-cloud-applications/
- Raw markdown:
raw/aws/2025-12-11-architecting-conversational-observability-for-cloud-applicat-aed39933.md - Example repo: aws-samples/sample-eks-troubleshooting-rag-chatbot
- Supplementary: AWS re:Invent 2025 – Streamline Amazon EKS operations with Agentic AI; KubeCon – From Logs To Insights: Real-time Conversational Troubleshooting for Kubernetes with GenAI
Related¶
- systems/aws-devops-agent — AWS-managed-service peer to this blueprint; the self-build → AWS-managed-service evolution over ~3 months (this post 2025-12-11, DevOps Agent 2026-03-18).
- systems/bits-ai-sre — Datadog's SaaS peer to AWS DevOps Agent, a step further removed from this self-build blueprint.
- concepts/observability — parent concept where this pattern plugs in under the "agent-assisted debugging layer" section.
- concepts/telemetry-based-resource-discovery — the more advanced discovery methodology that the AWS DevOps Agent post introduces on top of this post's simpler "live kubectl + stored telemetry" shape.
- patterns/specialized-agent-decomposition — Strands deployment's three-agent split.
- sources/2026-03-18-aws-ai-powered-event-response-for-amazon-eks — the AWS-managed-service successor post.
- sources/2025-07-16-aws-amazon-s3-vectors-preview-launch — introduces S3 Vectors, the cold vector store used in the Strands deployment.