CONCEPT Cited by 1 source
Structured journaling tool¶
Definition¶
A structured journaling tool is an agent tool whose only job is to accumulate typed entries into an append-only log that represents an agent's working memory. Entries are constrained to a small enum of types (e.g. decision, observation, finding, question, action, hypothesis), auto-annotated with execution-context metadata (phase / round / timestamp), and rendered back to the agent — and to peer agents — as chronology on subsequent invocations.
The structure is deliberately thin: the tool "does nothing more than accumulate entries" (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications). There is no query language, no score, no retrieval ranking — the shape is pure append + typed + auto-annotated.
Named and canonicalised in Slack's Managing context in long-run agentic applications post as the Director's planning tool in Spear.
Slack's six entry types (verbatim)¶
| Type | Purpose | Example |
|---|---|---|
| decision | Strategic choices | "Focus investigation on authentication anomalies rather than network activity" |
| observation | Patterns noticed | "Multiple failed logins preceded the successful authentication" |
| finding | Confirmed facts | "User authenticated from IP 203.0.113.45, not in historical baseline" |
| question | Open items | "Was the VPN connection established before or after the suspicious activity?" |
| action | Steps taken/planned | "Requested Cloud Expert to examine EC2 instance activity" |
| hypothesis | Working theories | "This pattern suggests credential stuffing rather than account compromise" |
Plus optional: priority, follow-up actions, citation references to evidential artifacts. Auto-annotated with: phase (discovery / trace / conclude), round number, timestamp.
Why structure matters¶
Free-form agent scratchpads have three failure modes the structured-journaling-tool shape is designed to avoid:
1. Type erasure kills downstream reasoning¶
An agent that logs "I think the user authenticated from a new IP and this is suspicious and I should investigate further" has conflated a finding (new IP), a hypothesis (suspicion), and an action (further investigation) into a single sentence. Later, when a peer agent or the same agent on a later round tries to consume this, the three claims are at different epistemic altitudes and must be manually re-disentangled — typically via another LLM call to interpret the scratchpad, which reintroduces hallucination risk.
Typed entries make the altitudes explicit: findings are confirmed facts, observations are patterns, hypotheses are theories, actions are next steps. The agent has to decide the altitude when writing.
2. No auto-annotation → no temporal reasoning¶
Without phase/round/timestamp auto-annotation, the Journal can't support "what did we know at round 3?" or "what did the Director decide in the discovery phase?" These are load-bearing questions during debugging + supervisor review. Auto-annotation is cheaper than asking the agent to include its own metadata (the tool knows what phase+round it was invoked in; the agent doesn't need to).
3. Free-form scratchpad rots under compaction¶
As the Journal grows, any agent receiving it in-prompt faces a context-budget problem (see concepts/context-engineering). Free-form scratchpads can only be compacted by summarisation (lossy, hallucination-prone) or truncation (loses early state). Typed entries can be filtered: show me only the decisions, show me the findings from the last two rounds, show me the open questions — all without re-invoking a model.
What the tool does not do¶
- Not a database. No queries, no indices, no search. It's an append-only list with type tags.
- Not a scoring surface. The journal entries themselves are not credibility-scored. Scoring happens elsewhere (see concepts/credibility-scoring-rubric).
- Not authoritative for execution state. The Journal is what the Director thinks; the execution event stream is what the system does. Two separate artifacts (see patterns/three-channel-context-architecture for how they relate).
- Not compacted by the tool. The tool itself doesn't summarise, filter, or prune. Compaction, if needed, happens in the orchestration layer.
Why append-only (even if not strictly mandated)¶
The post's phrasing ("nothing more than accumulate entries") strongly implies append-only semantics. This has two architectural consequences:
- Replay-friendly. Re-running an investigation from a point-in-time snapshot of the Journal is deterministic — nothing can have been silently overwritten.
- Corrections become observations. A mistaken finding doesn't get edited; it gets superseded by a later entry (e.g. a new observation that "finding X from round 2 is contradicted by tool result Y in round 4"). This preserves the investigation's reasoning history.
If entries could be edited in place, the Journal would be a working-memory artifact, not a working-memory history — a subtler but architecturally different thing.
Who consumes the Journal¶
In Slack's Spear architecture (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications):
- The Director reads its own Journal every invocation to remember prior decisions and open questions. This is the "short notes for working memory" use case.
- The Experts receive the current Journal content rendered as chronology in their prompt. System prompts include "guidance that explains the Director's role, their relationship to the Director, the purpose of the Journal, and how to interpret it." So Experts know what questions the Director has asked before, what hypotheses are active, and what actions are pending.
- The Critic receives the Journal as input to the Timeline task — the Timeline is synthesised from three sources: "The most recent Review, the previous Critic's Timeline, the Director's Journal."
The Journal is thus shared state across all agents, but authored only by the Director. Reads are broad; writes are narrow. This is the canonical architectural lever that turns working memory into shared narrative.
Contrasts¶
- vs. conversation-history buffer — classic LLM conversation state is a flat message list. The journaling tool is explicitly structured, typed, and auto-annotated.
- vs. scratchpad prompt pattern — free-form scratchpad sections in a prompt have no type-system, no phase annotation, no peer-agent-readability guarantees.
- vs. event log — an event log captures what the system did; the Journal captures what the agent thought. Different artifacts, complementary roles.
- vs. vector memory — vector memory stores embeddings of past conversation for semantic retrieval; the journaling tool stores typed structured entries for direct in-prompt rendering, no retrieval model involved.
- vs. rule-based blackboard (classical AI) — architecturally the closest historical relative. Blackboard systems had typed facts, rule-driven agents, and a shared substrate. The journaling tool is a minimalist modern re-implementation scoped to a single planning agent's output.
When to reach for it¶
- Long-running multi-agent tasks where a single agent's working memory must survive across many model invocations.
- Planner-role agents (Directors, Orchestrators, Coordinators) whose output is decisions and open questions, not leaf outputs.
- Peer-agent coordination where peer agents need to consume the planner's reasoning state without re-deriving it from scratch.
- Audit / supervision contexts where a human supervisor must later reconstruct why the planner chose a path.
When not to reach for it¶
- Single-turn tasks. The infrastructure cost of a journaling tool is not earned by a one-shot query.
- Non-planner agents. Experts and Critics in Spear do not have journaling tools — their job is to produce findings / scores, not to plan. Adding journaling to every agent in a multi-agent system is over-kill.
- Heavily mutable reasoning state. If the reasoning state is best modelled as a tree that gets pruned + grafted, a journal-append model is a poor fit; a dedicated state store is better.
Seen in¶
- systems/slack-spear — canonical first wiki instance. Six-type Journal owned by the Director, read by all three persona categories. "The Journal captures decisions, observations, hypotheses, and open questions in a structured format. It serves as the Director's working memory." Auto-annotated with phase/round/timestamp. (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications)
Related¶
- patterns/three-channel-context-architecture
- patterns/director-expert-critic-investigation-loop
- concepts/investigation-phase-progression
- concepts/prompt-is-not-control
- concepts/no-message-history-carry-forward
- concepts/online-context-summarisation
- concepts/context-engineering
- concepts/structured-output-reliability