PATTERN Cited by 1 source

Dashboard agent service¶

Intent¶

Productise an agent loop — especially a long-running, multi- agent, multi-step one — as a three-component service:

Hub — service API + persistent-storage interface + metrics. Single source of truth for investigations, events, and per-invocation records.
Worker — queue consumer. Pulls queued agent tasks from the Hub, runs the agent loop, emits an event stream back through the Hub API.
Dashboard — real-time observer + replay interface. Staff interface for launching ad-hoc investigations, watching running ones live, and drilling into per-invocation records for debugging.

Event flow: Worker → Hub → Dashboard (through the Hub's event API, which bridges persistent state and live consumers).

Canonicalised by Slack's Security Engineering team as the production architecture of Slack Spear, their security- investigation agent service. Replaced a prototype that used a "coding agent CLI" as the execution harness (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents).

Slack's verbatim statement¶

"Our prototype used a coding agent CLI as an execution harness, but that wasn't suitable for a practical implementation. We needed an interface that would let us observe investigations occurring in realtime, view and share past investigations, and launch ad-hoc investigations. Critically, we needed a way of integrating the system into our existing stack, allowing investigations to be triggered by our existing detection tools."

(Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)

Component responsibilities¶

Hub¶

Verbatim: "The hub provides the service API and an interface to persistent storage. Besides the usual CRUD-like API, the hub also provides a metrics endpoint so we can visualise system activity, token usage, and manage cost."

Hub owns:

Investigation records (one per alert / ad-hoc query).
Event stream persistence (every phase transition, every model invocation, every persona hand-off).
Metrics (system activity, token usage, cost by investigation / phase / persona).
Integration surfaces — the API that detection tools call to queue an investigation, and the API the Dashboard calls to subscribe to events.

Worker¶

Verbatim: "Investigation workers pick up queued investigation tasks from the API. Investigations produce an event stream which is streamed back to the hub through the API. Workers can be scaled to increase throughput as needed."

Worker owns:

Pulling queued tasks from the Hub's queue.
Running the agent loop (e.g. Director / Expert / Critic
phase machine — see patterns/director-expert-critic-investigation-loop).
Emitting the event stream back through the Hub API as the loop progresses.
Horizontal scalability — workers scale with throughput; state lives in the Hub.

Workers are stateless (modulo the current investigation's in-memory working set); all durable state is in Hub storage.

Dashboard¶

Verbatim: "The Dashboard is used by staff to interact with the service. Running investigations can be observed in real- time, consuming the event stream from the hub. Additionally the dashboard provides management tools, letting us view the details of each model invocation. This capability is invaluable when debugging the system."

Dashboard owns:

Launching ad-hoc investigations (staff UI → Hub API).
Live observation — subscribing to the Hub's event stream for a running investigation.
Replay — browsing past investigations and drilling into per-invocation records.
Model-invocation inspection — per-call input / output / schema / timing, invaluable for debugging.
Investigation sharing — verbatim "view and share past investigations."

Mechanism¶

  ┌────────────────┐
  │Detection tools │─queue─►┌──────┐
  └────────────────┘        │      │
                            │ Hub  │
  ┌────────────────┐        │      │
  │Staff Dashboard │◄──────►│      │◄───events─┐
  └────────────────┘   api  │      │           │
                            └──┬───┘           │
                               │               │
                               │               │
                            poll │              │
                               │               │
                               ▼               │
                          ┌─────────┐          │
                          │ Worker  │──events──┘
                          │ (pool)  │
                          └─────────┘

Event lifecycle:

Detection tool POSTs an alert to the Hub → investigation queued.
Worker long-polls the Hub, pulls the next queued task, begins the agent loop.
As the loop runs, every phase transition + per-persona invocation generates an event, streamed to the Hub.
Dashboard clients subscribed to the investigation's event stream receive events in real time.
Events are persisted to Hub storage; once complete, the investigation is browsable + replayable via Dashboard.
Metrics endpoint exposes aggregate system activity + token usage + cost for operator oversight.

Why this shape (vs. alternatives)¶

vs. CLI-as-harness¶

Slack's explicit starting point. Works for one developer's laptop; fails for:

No multi-user live observation (only the person who ran the CLI can see output).
No integration with detection tools (CLIs don't accept incoming alerts).
No replay / share of past runs.
No aggregate metrics across runs.

vs. monolithic web app¶

Bundling worker + API + dashboard into one process works at small scale but:

Can't scale workers independently — API and Dashboard have different scale profiles than compute-heavy workers.
Dashboard ↔ worker failures coupled — dashboard outage takes out investigation throughput.
Deployment coupling — changing the UI redeploys the worker.

Separating the three tiers decouples these concerns.

vs. just-a-queue¶

A pure queue-and-workers system (no Hub API, no Dashboard) runs the agent loop but doesn't productise observation, replay, debugging, or sharing. Slack explicitly calls out these four as the four reasons for the service architecture.

When to reach for it¶

Long-running agent loops (security investigations, deep research, incident response). Short single-shot agents don't need this shape.
Multi-user workflow — on-call rotation, team review, shared inspection.
Integration with other systems — alerts from detection tools, tickets from workflow systems, notifications out.
Cost/usage matters — Hub metrics are the operator's only view into aggregate model spend across runs.
Debuggability is load-bearing — production multi-agent systems fail in subtle ways; per-invocation Dashboard drill-down is often the only way to reconstruct what happened.

Composes with¶

patterns/director-expert-critic-investigation-loop — the natural agent-loop shape the Worker runs.
patterns/one-model-invocation-per-task — per-task invocations naturally become per-event records the Hub persists and the Dashboard inspects.
patterns/durable-event-log-as-agent-audit-envelope — the Hub's event-stream persistence + per-invocation records are an agent audit envelope. This pattern is the productisation shape; the audit-envelope pattern is the durability discipline.
patterns/snapshot-replay-agent-evaluation — stored investigations become replay corpora for agent-quality evaluation.
patterns/four-component-agent-production-stack (Redpanda Openclaw) — the Redpanda four-component governance stack (Gateway + Audit + Token vault + Sandboxed compute) is the enterprise substrate; Hub/Worker/Dashboard is the single-agent-service productisation shape that sits on top. The two patterns operate at different altitudes and compose.

Contrasts¶

vs. patterns/four-component-agent-production-stack — different altitude. Four-component is cross-cutting governance substrate; Hub/Worker/Dashboard is an individual agent service's shape.
vs. workflow engine (Temporal / Cadence / Airflow) — a workflow engine could back the Worker tier (Slack doesn't disclose whether theirs does). The pattern describes the service topology independent of what executes the workflow.
vs. serverless functions per task — per-task lambdas lose the per-investigation stateful working-set Slack's Director keeps in the journal. Workable for short-lived, stateless agent flows; not for long investigations.

Tradeoffs¶

Three components to deploy + operate — each tier needs its own CI/CD, monitoring, on-call. Small teams may find this overweight.
Event-stream consistency — events must reach Hub durably; lost events produce incomplete Dashboard replay.
Hub as scaling bottleneck — all events, API calls, dashboard subscriptions funnel through Hub. Hub design must be throughput-sized.
Dashboard auth + access control — staff Dashboard has broad visibility; auth story matters.

Seen in¶

systems/slack-spear — canonical first wiki instance. Three components explicitly named (Hub, Worker, Dashboard). Event stream flows worker → hub → dashboard. Dashboard supports live observation, past-investigation sharing, and per-invocation debugging. Replaces prototype's coding-agent- CLI harness. (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)