PATTERN Cited by 1 source
Telemetry-to-RAG pipeline¶
Intent¶
Build a streaming pipeline that continuously ingests operational telemetry (logs, events, metrics, traces) into a vector store so that an LLM-driven investigation agent can retrieve semantically similar past-incident signal and inject it into prompts at query time. Turns historical operational data into the retrieval corpus of a troubleshooting loop — "Retrieval-Augmented Generation over telemetry".
Context¶
Classic RAG indexes product documentation or knowledge articles and answers natural-language questions over them. For operational questions ("why is my pod stuck in pending?", "why did checkout error-rate spike at 14:32?"), the useful retrieval corpus is not documentation — it's the team's own telemetry. Past kubelet logs, prior events, resolved incidents, application logs, metric anomalies all contain relevant signal.
Unlike documentation, telemetry:
- streams in continuously — the pipeline must be always-on, not batch,
- has vastly higher volume — cost engineering matters per layer,
- can contain sensitive data — logs regularly leak PII, tokens, secrets; sanitization is not optional,
- is latency-sensitive for ingest — recent events must be retrievable during incidents that are still in progress.
Canonical wiki reference: AWS's conversational-observability blueprint (sources/2025-12-11-aws-architecting-conversational-observability-for-cloud-applications).
Mechanism¶
Typical AWS shape (other cloud shapes substitute equivalents):
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Fluent Bit │───▶│ Kinesis Data │───▶│ Lambda │
│ DaemonSet │ │ Streams │ │ normalize + │
│ in cluster │ │ (buffer) │ │ embed (batch)│
└──────────────┘ └──────────────┘ └──────┬───────┘
│
▼
┌──────────────┐
│ Bedrock │
│ Titan Embed │
│ text v2 │
└──────┬───────┘
│
▼
┌────────────────────────┐
│ Vector store: │
│ OpenSearch Serverless │
│ (hot, RAM-backed) │
│ OR S3 Vectors │
│ (cold, cost-optimized) │
└────────────────────────┘
Step-by-step:
- Collect telemetry in-cluster with a lightweight forwarder. Fluent Bit DaemonSet taps app logs, kubelet logs, and Kubernetes events. Low per-pod overhead; aggregates locally; forwards upstream.
- Buffer via a streaming substrate — Kinesis Data Streams decouples ingest spikes from embedding-compute capacity and provides durability during downstream outages.
- Normalize in a stateless compute tier — Lambda consumes Kinesis records, parses log lines / event records into a canonical shape, and sanitizes sensitive fields before embedding (stripping secrets, masking tokens, dropping PII fields).
- Embed in batches — the same Lambda calls Bedrock's embedding endpoint (Titan Embeddings v2 in the canonical reference) on a batch of normalized events. Explicit guidance from AWS: "for better performance and cost-efficiency, your Lambda functions should use batching when ingesting data from Kinesis, generating embeddings, and storing them in OpenSearch." Batching is the cost lever that makes this pipeline economical at scale.
- Store vectors with metadata — writes
{vector, metadata}to OpenSearch Serverless (k-NN plugin) for hot, low-latency retrieval, or to S3 Vectors for cold, cost-optimized retrieval. Metadata includes timestamp, namespace, pod, severity — filter predicates at query time. - Retrieve at query time — agent query → embed (same model) → k-NN search (with optional filters) → top-k telemetry snippets → augmented prompt → LLM.
Design decisions and trade-offs¶
- Embedding model choice. Dimensionality / cost / quality
trilemma. The canonical reference uses Titan Embeddings v2
(
amazon.titan-embed-text-v2:0); the Strands variant uses 1024-dim embeddings optimized for S3 Vectors. Domain-adapted embeddings would likely beat general-purpose but are not attempted in the reference architecture. - Hot vs cold vector store — concepts/hybrid-vector-tiering applies: OpenSearch Serverless is RAM-heavy and fast; S3 Vectors is cheap and cold. A production deployment may want patterns/cold-to-hot-vector-tiering — recent weeks in OpenSearch, older history in S3 Vectors.
- Sanitization is not optional. Embeddings inherit the governance posture of their source data. A log line with an accidentally-logged API key, embedded, is a secret-in-vector- store problem that later retrieval leaks into prompts. The concepts/sensitive-data-exposure boundary sits at the Lambda normalize step — once embedded, the cost of recall is high.
- Batch size vs freshness. Larger Lambda batches save embedding cost but delay when a given event becomes retrievable. Active incidents want small batches; steady-state ingest wants large batches. Tune per workload.
- Schema drift. Telemetry shape changes — new log formats, new event kinds. Normalization logic drifts or breaks silently, degrading retrieval quality without obvious signal. Requires monitoring of its own.
- Retention / cost ceiling. Vectors are not free per-record. A year of pod logs embedded is a large bill; deliberate retention / re-embedding cadence is needed.
Retrieval patterns that ride on top¶
- Plain k-NN — embed the user query, return the top-k closest vectors by cosine/Euclidean distance.
- Metadata-filtered k-NN — restrict search to a namespace, severity, or time window via metadata predicates. Often critical — an "error" query for my-service shouldn't retrieve errors from every other service.
- Hybrid retrieval — combine k-NN with a lexical search over log contents for queries that include specific identifiers (trace IDs, pod names, error codes); see concepts/hybrid-retrieval-bm25-vectors.
When to use¶
- MTTR reduction via natural-language investigation is a stated operational goal.
- Telemetry is high-volume and multi-source enough that dashboards alone aren't cutting it.
- Engineers repeatedly re-ask the same question across incidents — RAG memoizes the retrieval step.
- Cross-team operational knowledge sharing is a pain point — embedded past-incident notes become reusable signal.
When not to use¶
- Low-volume systems where humans can grep logs directly — the pipeline is overkill.
- Hard-SLA real-time control loops — retrieval + LLM latency is seconds, not milliseconds; this is for investigation, not control.
- Teams without the budget to maintain embedding cost — the pipeline's running cost scales with telemetry, and telemetry only grows.
- Regulated data where embedding would violate data-locality constraints — vectors are derived artifacts, but still subject to sovereignty / residency rules in many jurisdictions.
Relationship to other patterns¶
- patterns/cold-to-hot-vector-tiering — how to compose OpenSearch Serverless + S3 Vectors rather than picking one.
- patterns/allowlisted-read-only-agent-actions — the downstream agent that consumes the retrieved context; the two patterns compose to form the full agentic-observability stack.
- patterns/auto-scaling-telemetry-collector — the upstream discipline on the Fluent Bit tier to avoid self-DoS'ing (concepts/monitoring-paradox).
- patterns/central-telemetry-aggregation — classic dashboard-centric telemetry aggregation; this pattern adds a parallel RAG path on top of the same data substrate.
Seen in¶
- sources/2025-12-11-aws-architecting-conversational-observability-for-cloud-applications — canonical wiki reference. Two variants: OpenSearch Serverless (default) and S3 Vectors (Strands agentic variant). Fluent Bit → Kinesis → Lambda + Bedrock embeddings is the prescribed ingest shape. Log sanitization is called out as a Security-pillar requirement. Batching is called out as a cost / performance requirement.
Related¶
- systems/amazon-opensearch-service
- systems/s3-vectors
- systems/amazon-kinesis-data-streams
- systems/fluent-bit
- systems/aws-lambda
- systems/amazon-bedrock
- systems/amazon-titan-embeddings
- concepts/vector-embedding
- concepts/vector-similarity-search
- concepts/observability
- concepts/hybrid-vector-tiering
- concepts/agentic-troubleshooting-loop
- patterns/cold-to-hot-vector-tiering
- patterns/allowlisted-read-only-agent-actions