Skip to content

PATTERN Cited by 1 source

Swarm of discovery agents for context prebuild

Intent

Run many parallel AI agents against a customer's data-surface-area (telemetry, codebase, document store) as a background task, producing a searchable corpus of structured context ahead of any user question — so that interactive agent queries are served by cheap retrieval over pre-computed memory instead of expensive on-demand exploration.

The "swarm" framing emphasises:

  • Parallelism across discovery subtasks — each agent walks a distinct slice of the surface area, and the slices compose into a full corpus.
  • Role specialisation — different pipeline stages need different agent capabilities (data-source enumeration, metric extraction, dependency inference, summarisation).
  • Background / amortised cost — the swarm runs on its own schedule, not in response to user queries, so inference cost is amortised across many user sessions.

Canonical instance (Grafana Assistant, 2026-05-01)

(Source: sources/2026-05-01-grafana-how-grafana-assistant-learns-your-infrastructure-before-you-even-ask)

Grafana's four-stage swarm pipeline for infrastructure memory:

  1. Data source discovery — identify all connected Prometheus, Loki, and Tempo data sources in the stack.
  2. Metrics scans"Agents query your Prometheus data sources in parallel to find services, deployments, and infrastructure components."
  3. Enrichments via logs and traces — correlate Loki / Tempo data with corresponding Prometheus metrics, adding context about log formats, trace structures, service dependencies.
  4. Structured knowledge generation — per discovered service group, produce a five-category schema document.

Verbatim framing: "A swarm of AI agents does the heavy lifting."

When to reach for this pattern

  • Large surface area that is expensive to walk on-demand: a stack with hundreds of services, a codebase with thousands of modules, a knowledge base with many documents. Per-query exploration doesn't scale.
  • User questions are predictably-shaped against a precomputable answer schema. "What does X depend on?" → look up X's Dependencies chunk. Not a fit for open-ended, non-schematic queries.
  • Expensive LLM inference vs. cheap retrieval. Swarm runs once per refresh cycle at LLM cost; user queries are served by vector-DB lookup at commodity cost.
  • Substrate permits parallel enumeration. Data-source APIs, file system walks, database listings — all agent- walkable in parallel.

When to avoid it

  • Surface area is small. A handful of services / modules / documents — on-demand exploration is cheaper than pre-building a corpus.
  • Queries are highly novel. If users ask unanticipated questions that don't map onto a schema, precomputed memory helps less; the agent must fall back to raw exploration anyway.
  • Data changes faster than refresh cadence. If state changes within minutes (financial trading, real-time control systems), a weekly memory is actively misleading.
  • Inference cost is prohibitive. Swarm inference over thousands of service groups is real LLM spend; for small organisations it may be uneconomical.

Structural properties

Parallelism axis

The swarm parallelises across the surface, not across queries. Each user query hits a single vector-DB lookup; the parallelism is all in the background build.

Role decomposition

Grafana's swarm has four named roles (one per pipeline stage). Each role has narrower context than a general research-agent:

  • Discovery agents only need to enumerate data sources.
  • Metric-scan agents only need to query Prometheus.
  • Enrichment agents only need to correlate three data sources.
  • Summarisation agents only need to synthesise the five- category schema.

Role specialisation is an instance of patterns/specialized-agent-decomposition at the background-batch altitude.

Output schema discipline

All summarisation agents produce output against the same typed schema (patterns/five-category-service-knowledge-schema). Without schema discipline, the corpus becomes a pile of incommensurable free-form summaries that semantic search can't navigate effectively.

Background cadence

The swarm runs on its own schedule (weekly refresh in the Grafana instance) plus optional manual-trigger. User interactions never wait for the swarm.

Relationship to adjacent patterns

Pattern Relationship
patterns/precomputed-agent-context-files Parent pattern. Meta's per-module markdown files; same idea at a different substrate (code vs. telemetry) and a different output format (freeform markdown vs. typed schema).
patterns/parallel-subagent-execution-for-latency Latency-time counterpart. Parallel subagents to reduce per-query latency; swarm-of-discovery is to reduce aggregate cost across many queries by moving compute offline.
patterns/specialized-agent-decomposition Role shape. Each pipeline stage is a specialised agent; composition makes the swarm.
patterns/five-category-service-knowledge-schema Output contract. The schema the swarm's summarisation stage produces.

Failure modes

  1. Partial pipeline failure. If stage 3 (enrichment) fails for a service, stage 4 produces a memory with empty Dependencies + Log-structure fields. Handling strategy undisclosed for Grafana Assistant.
  2. Swarm-scale inference cost. Re-extracting every service group weekly is real LLM spend; cost-per-stack is undisclosed.
  3. Concurrent-refresh race. If a manual trigger runs alongside the automatic weekly cycle, partial memories may be visible mid-refresh. Concurrency discipline not disclosed.
  4. Un-instrumented coverage gap. Agents can only discover services that emit metrics — the telemetry-as-substrate precondition. Silent services are invisible.
  5. Discovery-agent over-access. The swarm must run with enough privilege to read all data sources; this privilege is higher than any individual user's. ACL propagation (concepts/acl-propagated-agent-memory) at retrieval time is the compensating control.
  6. Summarisation hallucination. Stage-4 LLM agents may hallucinate metric names or dependencies not present in the underlying telemetry. Validation discipline (check every claimed metric name exists in Prometheus before writing memory) undisclosed.

Seen in

Last updated · 445 distilled / 1,275 read