PATTERN Cited by 1 source
Timeline assembly from scored findings¶
Intent¶
After a critic has scored per-finding credibility, run a separate task that assembles those findings into a consolidated chronological narrative with explicit consolidation rules, a narrative-coherence score, and a capped gap-identification output.
The pattern is the second stage of a two-stage hallucination filter:
- First stage — per-finding credibility scoring against an evidence-based rubric (see concepts/credibility-scoring-rubric). Gates at the individual-claim level.
- Second stage — this pattern. Assembles surviving findings into a coherent narrative and prunes findings that don't fit. Gates at the story level.
Canonicalised by Slack's Security Engineering team as the Critic's Timeline task in Spear (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications).
Mechanism¶
Inputs (three artifacts)¶
Slack's Timeline task takes three inputs (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications):
- The most recent Review — per-finding credibility scores from this round.
- The previous Critic's Timeline — the running chronology from prior rounds.
- The Director's Journal — the Director's planning state (context for what questions were being asked).
The fold¶
The output is a new Timeline that incorporates new findings while preserving prior narrative structure:
new_timeline = assemble(
prev_timeline,
filter(this_review, score >= plausibility_threshold),
current_journal
)
This is a fold over investigation rounds — each round's Timeline is a function of the previous round's Timeline plus the current round's Review + Journal. Because each merge is bounded-work, the fold scales linearly in rounds, not quadratically as re-summarising the full transcript would.
See concepts/online-context-summarisation for the broader pattern.
Consolidation rules (four, explicit)¶
Slack specifies four consolidation rules (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications):
- Include only events supported by credible citations — speculation doesn't belong on the Timeline. The per-finding credibility score is a membership test.
- Remove duplicate entries describing the same event — an event shouldn't appear twice because two Experts mentioned it.
- When timestamps conflict, prefer sources with stronger evidence — a log-entry timestamp beats an inferred time.
- Maintain chronological ordering based on best available evidence — events must flow logically in time.
Four rules, no more. Small enough to fit in a prompt, short enough to audit, concrete enough to explain output to a human reviewer.
Narrative-coherence scoring¶
The assembled Timeline is scored against a second 5-level rubric (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications):
| Score | Label | Meaning |
|---|---|---|
| 0.9-1.0 | Trustworthy | Strong corroboration across multiple sources, consistent timestamps, no significant gaps |
| 0.7-0.89 | Highly-plausible | Good evidence support, minor gaps present, mostly consistent Timeline |
| 0.5-0.69 | Plausible | Some uncertainty in event ordering, notable gaps exist |
| 0.3-0.49 | Speculative | Poor evidence support, significant gaps, conflicted narrative |
| 0.0-0.29 | Invalid | No evidence, confounding inconsistencies present |
Same numeric bands as the per-finding credibility rubric but rebased from credibility-per-claim to coherence-of-the- whole-story. A Timeline is not trustworthy because each finding is trustworthy; it's trustworthy because the findings fit together.
Gap identification (capped at top-3)¶
See concepts/gap-identification-top-n.
The Timeline identifies three categories of gaps:
- Evidential — missing data that would strengthen conclusions.
- Temporal — unexplained periods between events.
- Logical — events that don't fit the emerging narrative.
Capped at top 3. The cap is architectural: not top-5, not all, not exhaustive. Forces triage in the Critic and produces actionable next-round input for the Director.
Why a separate task from Review¶
Slack explicitly separates the Timeline task from the Review task. The Review is token-intensive + tool-call-heavy (methodology audit via the introspection suite). The Timeline operates entirely on data in the prompt — no tool calls.
Slack's intuition (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications):
"Whereas the Review task is token intensive and requires the correct use of many tools, Timeline assembly operates entirely on data in the prompt. The intuition is that the more narrowly scoped task leaves a greater capacity for reasoning in the problem domain, rather than methods of data gathering or judgements of Expert methodology."
This is patterns/one-model-invocation-per-task applied to the Critic's two distinct jobs:
- Review = methodology audit + per-finding credibility
- Timeline = narrative assembly + consistency + gaps
Combining them would ask one model to simultaneously audit tool-use AND weave a coherent chronology — two different cognitive loads.
Hallucination-filtering claim¶
The canonical load-bearing claim (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications):
"The Timeline task raises the bar for hallucinated findings by enforcing narrative coherence. To be preserved, each finding must be consistent with the full chain of evidence; findings that contradict or lack support from the broader narrative are pruned. A hallucination can only survive this process if it is more coherent with the body of evidence than any real observation it competes with."
See concepts/narrative-coherence-as-hallucination-filter for the full argument. The structural intuition: truthful observations are consistent with reality and therefore fit together; hallucinations are uncorrelated fabrications and probabilistically don't cohere with multiple other real observations.
Operational properties¶
- Timeline length is bounded. By consolidation rule 2 (dedup), the Timeline doesn't grow linearly with rounds — new evidence either updates existing events, adds new events, or gets pruned as duplicate.
- Previous Timeline is authoritative for already-assembled events. Rule 3 (prefer stronger evidence) + rule 4 (chronological ordering) mean that an event's position in the previous Timeline is the default for the new Timeline, modulo new evidence.
- Gap identification is the signal for next-round questions. Director reads the top-3 gaps + the Timeline; next question often targets one of the gaps.
- Narrative-coherence score is an investigation health metric. A Timeline sitting at 0.83 (Highly-plausible) across several rounds is signal that the investigation is converging; a Timeline below 0.5 is signal to pivot or gather more evidence.
When to reach for this pattern¶
- Long-running investigations where per-round findings need to be woven into a running narrative.
- Hallucination-sensitive tasks where per-finding credibility scoring alone isn't enough; you need a second filter at the story level.
- Multi-agent loops where the Director (or another planner) needs a consolidated summary, not raw scored findings.
- Audit / supervisor-review contexts where a human needs to consume the investigation's current state at any point in a coherent form.
When not to reach for it¶
- Tasks without natural chronology. If the investigation isn't fundamentally about "what happened when," Timeline assembly is a forced shape.
- Short loops. A 2-3 round loop doesn't accumulate enough material to justify the Timeline task overhead.
- Tasks where first-stage credibility scoring is enough. If hallucination risk is low and per-finding credibility is adequate, skip the second stage.
Composes with¶
- patterns/director-expert-critic-investigation-loop — the loop shape this pattern sits inside. Timeline is written at the end of each round; Director consumes it at the start of the next round's decision phase.
- patterns/three-channel-context-architecture — this pattern is the third channel (Timeline) of three.
- patterns/critic-tool-call-introspection-suite — produces the credibility-scored Review that feeds this pattern.
- concepts/credibility-scoring-rubric — first-stage
filter providing the
score >= plausibility_thresholdmembership test. - concepts/narrative-coherence-as-hallucination-filter — the theoretical framing of why this pattern works.
- concepts/gap-identification-top-n — the top-3 gap cap that shapes the Timeline output.
- patterns/one-model-invocation-per-task — the principle of keeping Review and Timeline as separate invocations.
Contrasts¶
- vs. single-pass credibility + assembly — combine scoring and narrative assembly into one Critic pass. Simpler but conflates two different cognitive loads, produces worse output on both.
- vs. monotonic append Timeline — just append each round's findings in time-order. Faster, but no dedup, no evidence-weighted conflict resolution, no coherence scoring.
- vs. unbounded gap listing — list every gap identified, no cap. Produces reader fatigue and downstream paralysis (see concepts/gap-identification-top-n).
- vs. vector-memory summaries — retrieval-driven summary assembly. Can produce summaries but doesn't enforce narrative coherence or consolidation rules explicitly.
Tradeoffs¶
- Timeline assembly model-call latency. Adds a Critic invocation per round on top of the Review invocation. Mitigated by Timeline being in-prompt-only (no tool calls) and therefore cheaper than Review.
- Consolidation-rule drift. The four rules are prompt- text; as investigations evolve or the model changes, the rules may need re-tuning.
- Second rubric to maintain. Two distinct 5-level rubrics (credibility + coherence) double the rubric-calibration surface.
- Top-3 gap cap loses information. Gaps 4-N never surface to the Director; if the top-3 triage is wrong, downstream decisions suffer.
- Timeline task can itself hallucinate. Mitigated by (a) narrow scope keeping tokens low, (b) mid-tier model with stronger capability, (c) explicit consolidation rules forcing structure. But not eliminated.
Seen in¶
- systems/slack-spear — canonical first wiki instance.
Timeline task runs after each Review, rebuilds the Timeline
from
(prev_timeline, latest_review, journal). Four consolidation rules + 5-level narrative-coherence rubric + top-3 gaps across three categories. Specimen extract shows a 0.83 (Highly-plausible) Timeline for a false-positive alert investigation. Slack's verbatim hallucination- filtering claim: "A hallucination can only survive this process if it is more coherent with the body of evidence than any real observation it competes with." (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications)
Related¶
- systems/slack-spear
- patterns/director-expert-critic-investigation-loop
- patterns/three-channel-context-architecture
- patterns/critic-tool-call-introspection-suite
- patterns/one-model-invocation-per-task
- concepts/narrative-coherence-as-hallucination-filter
- concepts/credibility-scoring-rubric
- concepts/gap-identification-top-n
- concepts/weakly-adversarial-critic
- concepts/online-context-summarisation
- concepts/no-message-history-carry-forward