CONCEPT Cited by 1 source

No message-history carry-forward¶

Definition¶

No-message-history-carry-forward is the architectural discipline of not carrying raw accumulated LLM message history between agent invocations in a long-running multi- agent loop. Instead, each invocation receives curated state artifacts (journals, summaries, timelines, scored findings) produced by prior invocations — the raw message transcript is never passed forward.

The load-bearing canonical claim (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications):

"Besides these resources, we do not pass any message history forward between agent invocations. Collectively, these channels provide a means of online context summarisation, negating the need for extensive message histories."

And crucially, the position that this is not just a token-budget optimisation:

"Even if context windows were infinitely large, passing message history between rounds would not necessarily be desirable: the accumulated context could impede the agents' capacity to respond appropriately to new information."

Canonicalised by Slack's Security Engineering team for Spear's long-running multi-agent investigation architecture.

Why carrying history forward is the default¶

Agent frameworks (Pydantic AI, LangChain, LangGraph, Autogen, and similar) typically manage state management for users by accumulating message history between API calls. The shape is:

invocation 1: [sys_prompt, user_msg, assistant_msg1]
invocation 2: [sys_prompt, user_msg, assistant_msg1, tool_call, tool_result, assistant_msg2]
invocation 3: [sys_prompt, user_msg, assistant_msg1, tool_call, tool_result, assistant_msg2, ..., assistant_msgN]

This is the default because it's the simplest thing that works for short-running agents. The LLM API is stateless; to provide continuity, the caller provides the full history every call.

For short-run applications (a chatbot turn, a single RAG query) this is fine — the context doesn't overflow, the history is coherent, the agent's output quality is stable.

Why the default breaks for long-run agents¶

Slack's Spear investigations "can span hundreds of inference requests and generate megabytes of output." At that scale, three failure modes co-occur:

1. Context-window overflow¶

The naive message-history-carry-forward will eventually exceed any model's context window. Even GPT-4's 128k context or Claude's 200k won't hold hundreds of tool-call+result pairs each hundreds-to-thousands of tokens.

2. Cost + latency scaling¶

Even below the hard context limit, cost and latency scale with token count. A 100k-token prompt costs 10x what a 10k-token prompt costs, and takes meaningfully longer to process. For a hundred-invocation investigation, this compounds.

3. Quality degradation before the limit¶

Approaching the context window limit (not just hitting it) degrades response quality. This is the "context rot" phenomenon — models get confused, start repeating earlier findings, lose the thread. Slack flags this verbatim: "Even approaching an agent's context window limit can degrade the quality of responses."

Why the "even with infinite context" argument matters¶

The infinite-context thought experiment is load-bearing because it separates two distinct motivations:

Token budget pressure — you can't fit everything, so something must go.
Cognitive load management — even if you could fit everything, flooding the agent with accumulated context crowds out its ability to respond to new information.

Slack's position is that both motivations justify no-carry-forward. This elevates the pattern from "clever token-saving hack" to "architectural principle for multi-agent loops".

The intuition is that agents are attention-limited even when they have plenty of context budget. An agent presented with 50 pages of tool-call history and one new finding will give the new finding appropriate weight less reliably than an agent presented with a one-page summary and one new finding. The summary acts as curation, not just compression.

What carries forward instead (Slack's three channels)¶

The alternative to raw message history is curated state artifacts produced by the agents themselves:

Director's Journal — typed entries (decision / observation / finding / question / action / hypothesis) the Director produces and peer agents read.
Critic's Review — annotated findings with credibility scores against a 5-level rubric (see concepts/credibility-scoring-rubric).
Critic's Timeline — consolidated chronological narrative built from the most credible findings (see concepts/narrative-coherence-as-hallucination-filter).

Together they are the three-channel context architecture (see patterns/three-channel-context-architecture). Each channel is an order of magnitude smaller than the raw message history, and each is pre-digested for a specific consumer's needs.

What this looks like operationally¶

Instead of:

Director_invocation_N:
  [system, question, (tool_call, tool_result)_1..N-1, assistant_turn_1..N-1]

Spear's pattern is:

Director_invocation_N:
  [system, current_journal, latest_timeline, phase_state]

Expert_invocation_N:
  [system, domain_prompt, current_journal, directors_question]

Critic_Review_invocation_N:
  [system, rubric, experts_findings, tool_introspection_tools]

Critic_Timeline_invocation_N:
  [system, timeline_rules, prev_timeline, latest_review, current_journal]

Each invocation sees exactly what it needs, and no invocation sees the concatenated message history of earlier invocations.

When this pattern applies¶

Long-running multi-agent loops. Security investigations, incident response, deep research, compliance audits — tasks that can span hundreds of invocations.
Tasks with a natural "working memory" artifact. Anywhere a journal or timeline is a reasonable way to summarise progress — carry that forward instead of the raw chatter.
Tasks with multiple specialised roles. Planner + Expert
Critic or similar. Role-specialised views of state are more useful to each role than the shared raw history.
Tasks where quality matters more than raw trace fidelity. If you need a perfect replay of what happened, persist the event stream; but don't feed it back into agent prompts.

When not to apply¶

Short conversations. Chatbot turns where the total history fits comfortably; message-history-carry-forward is simpler and works.
Tasks where all prior turns are genuinely relevant. Some coding-agent tasks need full file-diff context; some math-solving tasks need all prior steps. Don't prune what the agent genuinely needs.
Single-agent loops. The overhead of defining a journal schema + summary artifact + compaction rules is not earned by a single agent with a short loop.

Contrasts¶

vs. sliding-window context — sliding-window approaches truncate the oldest messages. They still carry raw history; they just bound it. Slack's pattern replaces raw history entirely.
vs. summarisation middleware — some frameworks auto-summarise the oldest N turns when the context window fills. This is history-carry-forward with emergency compression; Slack's pattern is no-history-carry-forward by design.
vs. vector memory / RAG — vector memory retrieves semantically-relevant past context. Slack's pattern is structurally-relevant past context (what did the Director decide? what's the current timeline?).
vs. explicit agent-state libraries — some frameworks (LangGraph's TypedDict state, Autogen's GroupChat state) support typed state carry-forward. Slack's pattern is that principle applied at investigation-scale with explicit journals + scored findings + consolidated timelines.

Seen in¶

systems/slack-spear — canonical first wiki instance. Three context channels (Journal / Review / Timeline) carry all inter-invocation state. Message history is explicitly not carried forward. Justified both by context-budget pressure and by cognitive-load management. (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications)