Skip to content

CONCEPT Cited by 2 sources

Knowledge pyramid (model tiering)

Definition

A knowledge pyramid is a multi-agent architecture pattern that tiers model cost against task cognitive load: cheap models at the fan-out leaves where work is tool-call-heavy and data-voluminous; mid-tier models in the middle where the task is reviewing, condensing, and scoring; the most expensive models at the apex where the task is strategic planning over an already-condensed input. Total cost stays bounded because the expensive apex consumes a small, pre-digested context rather than the raw torrent.

Named and canonicalised by Slack's Security Engineering team in their Streamlining security investigations with agents post (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents).

The canonical three tiers (Slack's Spear system)

          ┌──────────────┐
          │   Director   │   ← Highest cost
          │   (apex)     │      Task: strategic planning
          │              │      Input: condensed timeline
          └──────────────┘
       ┌───────┴───────┐
       │     Critic    │    ← Mid-cost
       │   (middle)    │       Task: review, score, condense
       │               │       Input: raw findings + tool calls
       └───────────────┘
          ▲       ▲
   ┌──────┴──┐ ┌──┴──────┐
   │ Expert A│ │ Expert B│  ← Lowest cost
   │ Expert C│ │ Expert D│     Task: tool calls, data analysis
   └─────────┘ └─────────┘     Input: raw data-source responses

Slack's verbatim framing:

"At the bottom of the knowledge pyramid, domain experts generate investigation findings by interrogating complex data sources, requiring many tool calls. Analyzing the returned data can be very token-intensive. Next, the Critic's review identifies the most interesting findings from that set [...] it assembles an up to date investigation timeline, integrating the running investigation timeline and newly gathered findings into a coherent narrative. The condensed timeline, consisting only of the most credible findings, is then passed back to the Director. This design allows us to strategically use low, medium, and high-cost models for the expert, critic, and director functions, respectively."

(Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents)

Why it works

  • Token volume is inverted against reasoning density. The most tokens flow through the leaves (raw tool returns); the apex sees only a pre-digested summary. Spending the apex's high-cost model on already-compressed input gives the reasoning-intensive tier its best ROI.
  • Tool-call-heavy work doesn't need a frontier model. The cognitive load at the leaves is "call tool, read response, emit finding" — a task well-served by cheap fast models whose reasoning ceiling is adequate. Paying frontier-model rates for tool-call orchestration is waste.
  • Reviewing is easier than generating. Consistent with the drafter- evaluator pattern's core insight — "Spotting errors is simpler than perfect generation" — the Critic in the middle tier can be smaller than the apex and still catch the Experts' mistakes because its task is bounded to rubric application, not open-ended generation.
  • Strategic decisions deserve the top tier. The Director's phase-progression + next-question decisions shape the whole investigation; errors compound. Paying the apex rate here is high-leverage.

Applicability

The pattern generalises beyond security investigations to any multi-agent system with a clear fan-out → review → strategic- decision shape:

  • Code review. Leaves: per-domain reviewers (security, perf, style). Middle: a consolidating judge pass. Apex: a final coordinator that decides severity + what to surface (see patterns/coordinator-sub-reviewer-orchestration for Cloudflare's variant, which flattens middle+apex into a single coordinator-with-judge-pass).
  • Data pipelines. Leaves: per-column classifiers / rule checkers. Middle: schema-level consistency review. Apex: pipeline-level orchestration.
  • Research / deep-research agents. Leaves: per-source retrievers/summarisers. Middle: cross-source synthesis. Apex: research-plan progression + question reformulation.
  • LLM-as-judge evaluation harnesses. Leaves: per-case raters (cheap). Middle: rubric-aggregators. Apex: eval-suite trend interpretation / triage.

Contrasts

  • vs. LLM cascade — cascade is a "try cheap first, escalate on low confidence" shape along a single agent's trajectory. Knowledge pyramid is a "cheap everywhere there's fan-out, expensive where there's synthesis" shape across a multi-agent topology. Both reduce cost; they compose.
  • vs. frontier- model minion delegation — in minion delegation, the frontier model directs cheaper models as sub-agents for specific sub-tasks. Knowledge pyramid is an architectural discipline about which tier lives where in a multi-agent graph, rather than a run-time delegation decision.
  • vs. patterns/cheap-approximator-with-expensive-fallback — same cost-optimisation intent, different topology. The fallback shape branches; the pyramid aggregates.
  • vs. uniform model tier — the naive "use the best model for everything" baseline pays frontier rates for token- intensive low-reasoning leaf work. The pyramid is the explicit architectural rejection of that waste.

Design levers

Each tier's model choice is a knob; moving between tiers has cost/quality implications:

  • Push apex cheaper — risk: Director's phase-progression becomes sloppy; investigations terminate prematurely or loop.
  • Push middle cheaper — risk: Critic misses Expert hallucinations / mis-scores credibility; cascades bad findings upward. Slack's "weakly adversarial" stance assumes the Critic has enough capability to catch the Experts — pushing the middle tier too cheap breaks the premise. See concepts/weakly-adversarial-critic.
  • Push leaves cheaper — risk: Experts hallucinate tool calls / misinterpret data; but the Critic is the rest of the system's hedge against this.
  • Per-phase variation — Slack discloses "flexibility to vary the model invocation parameters by phase" — the trace phase can use a bigger Expert model + more tokens than discovery. See patterns/phase-gated-investigation-progression.

Caveats

  • Slack's tier labels are qualitative. Post characterises the three tiers only as "low, medium, and high-cost models" without naming families or disclosing cost ratios.
  • Assumes the middle tier is reliable at condensing. If the Critic can't reliably distil leaf findings into a timeline, the apex receives garbage; expensive Director tokens are wasted. The Critic's rubric (Slack's) is the lever that enforces this reliably.
  • Assumes the work decomposes cleanly. A task that can't be cleanly partitioned into fan-out + review + apex-decide may not fit the pyramid shape; different multi-agent shapes (pipelines, mesh, peer-to-peer) may fit better.

Seen in

  • systems/slack-spear — canonical first wiki instance. Director (high-cost) → Critic (mid-cost) → 4 Experts (Access, Cloud, Code, Threat — low-cost). Slack states verbatim that the tier assignment is a deliberate cost-management strategy. (Source: sources/2025-12-01-slack-streamlining-security-investigations-with-agents) Second post refines the Critic-tier justification: the Critic "only reviews submitted findings rather than the entire Expert run," keeping its token count bounded — which lets Slack assign a stronger mid-tier model than raw cost-alone would suggest. Disclosed: stronger-model choice is motivated by research showing stronger models "err less frequently" (cites arxiv 2411.04368). The three-channel context architecture (Journal + Review + Timeline) is the plumbing that keeps the Critic's token budget manageable enough to afford the mid-tier bump. (Source: sources/2026-04-13-slack-managing-context-in-long-run-agentic-applications)
Last updated · 470 distilled / 1,213 read