PATTERN Cited by 1 source
Weekly refresh cadence for agent context¶
Intent¶
Rebuild an AI agent's precomputed context corpus on a weekly background cadence, with a manual-trigger escape hatch for deliberate refresh after planned infrastructure changes. The weekly bound explicitly trades inference cost against memory staleness — fast enough that most infrastructure evolution is absorbed within one iteration, slow enough that per-refresh cost is amortised across many user interactions.
Canonical framing (Grafana Assistant, 2026-05-01)¶
(Source: sources/2026-05-01-grafana-how-grafana-assistant-learns-your-infrastructure-before-you-even-ask)
"The whole process refreshes automatically on a weekly cadence, so your assistant's understanding of your infrastructure stays current as your environment evolves."
And the escape hatch:
"You can also trigger a manual scan if you want to refresh the knowledge base ahead of the next automatic cycle."
Why weekly (and not faster or slower)¶
The cadence decision balances four competing pressures:
- Inference cost per refresh. Re-extracting every service group's five- category schema involves LLM calls for every summary. Daily refresh is 7× the weekly cost; hourly is 168×.
- Infrastructure evolution rate. Most production infrastructure changes on a cadence of days-to-weeks (new service deploys, renamed metrics, new dependencies). Weekly matches this natural rhythm.
- Staleness tolerance per use case. Agent queries about service topology and metric names tolerate a week's lag — the answer is "mostly right" even when the topology changed 3 days ago. Real-time queries ("is this service up right now?") bypass memory and query telemetry directly.
- Compute window availability. Weekly refresh runs as a batch job at low-utilisation hours; hourly would need always-on compute.
Faster cadences (daily, hourly) break down when: - Inference cost dominates, and most service groups haven't changed anyway. - Users observe the system's continual re-extraction and get anxious (the "no scheduled jobs to manage" UX commitment cuts both ways).
Slower cadences (monthly, quarterly) break down when: - Engineering velocity is high — a team shipping multiple services per week will see stale memory as soon as they onboard the Assistant. - Manual-trigger becomes the default instead of the escape hatch, defeating the auto-refresh pitch.
Weekly lands in the sweet spot for most engineering organisations. It's neither a load-bearing theoretical optimum nor a rigid requirement — just a reasonable equilibrium that Grafana has committed to publicly.
Mechanism¶
Three composing elements:
1. Automatic weekly cycle¶
Runs unattended. User takes no action. Grafana's UX commitment is "no setup steps, no configuration files, no scheduled jobs to manage." The refresh scheduler is Grafana-internal infrastructure, not surfaced to the customer.
2. Manual trigger¶
"You can also trigger a manual scan if you want to refresh the knowledge base ahead of the next automatic cycle." Intended for two scenarios: - After a planned rollout. A team launches a new service and wants the Assistant to pick it up before the next automatic cycle. - After observing stale memory. A user notices the Assistant referencing a renamed metric and wants to force a rebuild.
3. Atomic visibility (implied)¶
The Grafana post doesn't discuss this explicitly, but mid-refresh visibility (partial memories returned from semantic search while a rebuild is in progress) would be a user-visible failure mode. Best-practice implementations atomically swap the memory corpus — either the old corpus or the new, never a partial hybrid.
When to apply this pattern¶
- Context substrate evolves on days-to-weeks cadence. Infrastructure topology, code-module structure, customer service inventories — all fit.
- Extraction is expensive. LLM inference, large-file parsing, complex aggregations — anything where the per-refresh cost matters.
- Retrieval latency is critical. Users expect sub- second responses; on-demand extraction would be unacceptably slow.
- Memory freshness doesn't need to be real-time. Staleness on the order of days is acceptable; the agent can fall back to live queries for real-time questions.
When to avoid¶
- Substrate changes on minutes-seconds cadence. Agent memory would be continuously wrong; use on-demand extraction or streaming updates instead.
- Memory is cheap to build. If re-extraction is millisecond-class, build on demand and skip the refresh complexity.
- User expectations of real-time memory. Financial, safety-critical, or compliance-sensitive contexts where a week-old memory is a material risk.
Contrast with alternative refresh strategies¶
| Strategy | When to use | Drawback |
|---|---|---|
| On-demand | Small corpus, fast extraction | Per-query latency cost |
| Streaming | Event-driven substrate | Complex consistency logic |
| Hourly / daily refresh | Fast-evolving substrate | High inference cost |
| Weekly refresh (this pattern) | Days-to-weeks evolution | Stale memory for up to a week |
| Monthly+ refresh | Static / rarely-changing substrate | Stale after any release |
| Manual-only refresh | User controls cadence | Users forget; memory drifts |
Weekly-with-manual-escape is the default sweet spot for most engineering infrastructure.
Relationship to freshness concerns¶
Weekly cadence bounds memory staleness, but staleness still exists. The concepts/context-file-freshness concept page records Meta's observation that "context that goes stale causes more harm than no context" — an agent confidently referencing a renamed metric is worse than an agent admitting ignorance.
Mitigations (not all disclosed for Grafana Assistant):
- Per-chunk freshness metadata. Each memory chunk carries its extraction timestamp; agents can disclaim stale memories in responses.
- Cross-check at query time. Before responding, the agent verifies a claimed metric name still exists in Prometheus.
- Manual-trigger surfacing. Users near a fresh rollout are reminded to manually refresh.
- Detect memory rot from empty responses. If queries for a known-good service return no memory, trigger an incremental rebuild.
Failure modes¶
- Hot change between refreshes. A metric renamed on Tuesday is stale in memory until the next Sunday refresh (up to 6 days). Manual trigger is the user-accessible remediation; the Assistant itself may not detect the drift.
- Refresh failures. A failed weekly refresh leaves the corpus stale for two cycles. Monitoring discipline undisclosed.
- Concurrent refresh conflict. Manual trigger during automatic cycle; atomicity undisclosed.
- Partial-result visibility. Without atomic swap, users see mid-refresh inconsistency.
- Cadence discovered by users. Users who learn the weekly cadence time their manual triggers to beat the automatic cycle, turning the escape hatch into the default.
Seen in¶
- sources/2026-05-01-grafana-how-grafana-assistant-learns-your-infrastructure-before-you-even-ask — canonical wiki instance of weekly-refresh cadence with manual-trigger escape hatch. "The whole process refreshes automatically on a weekly cadence … You can also trigger a manual scan if you want to refresh the knowledge base ahead of the next automatic cycle." Applies to the Grafana Assistant infrastructure-memory corpus populated by the swarm-of-discovery-agents pipeline.
Related¶
- patterns/swarm-of-discovery-agents-for-context-prebuild
- patterns/precomputed-agent-context-files
- patterns/five-category-service-knowledge-schema
- patterns/self-maintaining-context-layer
- concepts/agent-infrastructure-memory
- concepts/context-file-freshness
- concepts/telemetry-as-discovery-substrate
- systems/grafana-assistant