CONCEPT Cited by 2 sources
Knowledge pyramid (model tiering)¶
Definition¶
A knowledge pyramid is a multi-agent architecture pattern that tiers model cost against task cognitive load: cheap models at the fan-out leaves where work is tool-call-heavy and data-voluminous; mid-tier models in the middle where the task is reviewing, condensing, and scoring; the most expensive models at the apex where the task is strategic planning over an already-condensed input. Total cost stays bounded because the expensive apex consumes a small, pre-digested context rather than the raw torrent.
Named and canonicalised by Slack's Security Engineering team in their Streamlining security investigations with agents post (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents).
The canonical three tiers (Slack's Spear system)¶
┌──────────────┐
│ Director │ ← Highest cost
│ (apex) │ Task: strategic planning
│ │ Input: condensed timeline
└──────────────┘
▲
┌───────┴───────┐
│ Critic │ ← Mid-cost
│ (middle) │ Task: review, score, condense
│ │ Input: raw findings + tool calls
└───────────────┘
▲ ▲
┌──────┴──┐ ┌──┴──────┐
│ Expert A│ │ Expert B│ ← Lowest cost
│ Expert C│ │ Expert D│ Task: tool calls, data analysis
└─────────┘ └─────────┘ Input: raw data-source responses
Slack's verbatim framing:
"At the bottom of the knowledge pyramid, domain experts generate investigation findings by interrogating complex data sources, requiring many tool calls. Analyzing the returned data can be very token-intensive. Next, the Critic's review identifies the most interesting findings from that set [...] it assembles an up to date investigation timeline, integrating the running investigation timeline and newly gathered findings into a coherent narrative. The condensed timeline, consisting only of the most credible findings, is then passed back to the Director. This design allows us to strategically use low, medium, and high-cost models for the expert, critic, and director functions, respectively."
(Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)
Why it works¶
- Token volume is inverted against reasoning density. The most tokens flow through the leaves (raw tool returns); the apex sees only a pre-digested summary. Spending the apex's high-cost model on already-compressed input gives the reasoning-intensive tier its best ROI.
- Tool-call-heavy work doesn't need a frontier model. The cognitive load at the leaves is "call tool, read response, emit finding" — a task well-served by cheap fast models whose reasoning ceiling is adequate. Paying frontier-model rates for tool-call orchestration is waste.
- Reviewing is easier than generating. Consistent with the drafter- evaluator pattern's core insight — "Spotting errors is simpler than perfect generation" — the Critic in the middle tier can be smaller than the apex and still catch the Experts' mistakes because its task is bounded to rubric application, not open-ended generation.
- Strategic decisions deserve the top tier. The Director's phase-progression + next-question decisions shape the whole investigation; errors compound. Paying the apex rate here is high-leverage.
Applicability¶
The pattern generalises beyond security investigations to any multi-agent system with a clear fan-out → review → strategic- decision shape:
- Code review. Leaves: per-domain reviewers (security, perf, style). Middle: a consolidating judge pass. Apex: a final coordinator that decides severity + what to surface (see patterns/coordinator-sub-reviewer-orchestration for Cloudflare's variant, which flattens middle+apex into a single coordinator-with-judge-pass).
- Data pipelines. Leaves: per-column classifiers / rule checkers. Middle: schema-level consistency review. Apex: pipeline-level orchestration.
- Research / deep-research agents. Leaves: per-source retrievers/summarisers. Middle: cross-source synthesis. Apex: research-plan progression + question reformulation.
- LLM-as-judge evaluation harnesses. Leaves: per-case raters (cheap). Middle: rubric-aggregators. Apex: eval-suite trend interpretation / triage.
Contrasts¶
- vs. LLM cascade — cascade is a "try cheap first, escalate on low confidence" shape along a single agent's trajectory. Knowledge pyramid is a "cheap everywhere there's fan-out, expensive where there's synthesis" shape across a multi-agent topology. Both reduce cost; they compose.
- vs. frontier- model minion delegation — in minion delegation, the frontier model directs cheaper models as sub-agents for specific sub-tasks. Knowledge pyramid is an architectural discipline about which tier lives where in a multi-agent graph, rather than a run-time delegation decision.
- vs. patterns/cheap-approximator-with-expensive-fallback — same cost-optimisation intent, different topology. The fallback shape branches; the pyramid aggregates.
- vs. uniform model tier — the naive "use the best model for everything" baseline pays frontier rates for token- intensive low-reasoning leaf work. The pyramid is the explicit architectural rejection of that waste.
Design levers¶
Each tier's model choice is a knob; moving between tiers has cost/quality implications:
- Push apex cheaper — risk: Director's phase-progression becomes sloppy; investigations terminate prematurely or loop.
- Push middle cheaper — risk: Critic misses Expert hallucinations / mis-scores credibility; cascades bad findings upward. Slack's "weakly adversarial" stance assumes the Critic has enough capability to catch the Experts — pushing the middle tier too cheap breaks the premise. See concepts/weakly-adversarial-critic.
- Push leaves cheaper — risk: Experts hallucinate tool calls / misinterpret data; but the Critic is the rest of the system's hedge against this.
- Per-phase variation — Slack discloses "flexibility to vary the model invocation parameters by phase" — the trace phase can use a bigger Expert model + more tokens than discovery. See patterns/phase-gated-investigation-progression.
Caveats¶
- Slack's tier labels are qualitative. Post characterises the three tiers only as "low, medium, and high-cost models" without naming families or disclosing cost ratios.
- Assumes the middle tier is reliable at condensing. If the Critic can't reliably distil leaf findings into a timeline, the apex receives garbage; expensive Director tokens are wasted. The Critic's rubric (Slack's) is the lever that enforces this reliably.
- Assumes the work decomposes cleanly. A task that can't be cleanly partitioned into fan-out + review + apex-decide may not fit the pyramid shape; different multi-agent shapes (pipelines, mesh, peer-to-peer) may fit better.
Seen in¶
- systems/slack-spear — canonical first wiki instance. Director (high-cost) → Critic (mid-cost) → 4 Experts (Access, Cloud, Code, Threat — low-cost). Slack states verbatim that the tier assignment is a deliberate cost-management strategy. (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents) Second post refines the Critic-tier justification: the Critic "only reviews submitted findings rather than the entire Expert run," keeping its token count bounded — which lets Slack assign a stronger mid-tier model than raw cost-alone would suggest. Disclosed: stronger-model choice is motivated by research showing stronger models "err less frequently" (cites arxiv 2411.04368). The three-channel context architecture (Journal + Review + Timeline) is the plumbing that keeps the Critic's token budget manageable enough to afford the mid-tier bump. (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications)
Related¶
- patterns/director-expert-critic-investigation-loop
- patterns/three-channel-context-architecture
- patterns/timeline-assembly-from-scored-findings
- patterns/specialized-agent-decomposition
- patterns/multi-round-critic-quality-gate
- patterns/drafter-evaluator-refinement-loop
- concepts/weakly-adversarial-critic
- concepts/credibility-scoring-rubric
- concepts/narrative-coherence-as-hallucination-filter
- concepts/frontier-model-minion-delegation
- concepts/llm-cascade
- patterns/cheap-approximator-with-expensive-fallback