Skip to content

PATTERN Cited by 1 source

Five questions knowledge extraction

Intent

Extract the tribal knowledge the AI coding agent actually needs per module by forcing five targeted questions that drive the depth of the extraction — rather than asking for "documentation" of the module and accepting a surface narrative.

The five questions

Meta's framework (Source: sources/2026-04-06-meta-how-meta-used-ai-to-map-tribal-knowledge-in-large-scale-data-pipelines):

# Question What it extracts
1 What does this module configure? Surface purpose; module-level comments + docs
2 What are the common modification patterns? Recent-commit rhythm; what engineers actually change
3 What are the non-obvious patterns that cause build failures? Pure tribal knowledge; failure-oriented
4 What are the cross-module dependencies? Subsystem-coupling invariants
5 What tribal knowledge is buried in code comments? Inline gotchas, "DO NOT REMOVE" markers, TODOs with blast-radius warnings

Meta's explicit finding: Question 5 produced the deepest learnings — 50+ non-obvious patterns like hidden intermediate naming conventions and append-only deprecated-enum rules. "None of this had been written down before."

Why these five specifically

The questions are ordered from surface to deep and feature- oriented to failure-oriented:

  • Q1 (purpose) is the question most documentation answers; it sets context for the rest.
  • Q2 (modifications) reveals the typical agent task shape — agents are more likely to be adding-a-field than designing-a-new- module.
  • Q3 (failure patterns) is the first tribal-knowledge question and the one that produces silent-wrong-output mitigations.
  • Q4 (dependencies) forces cross-module invariants into view; these rarely live in any one file.
  • Q5 (comment-buried knowledge) closes the loop by literally re-reading the code to find annotations engineers left behind.

The shape is deliberately failure-first, not feature-first — the agent needs to know what to avoid more than what's available.

How to run it

Each module analyst agent:

  1. Reads all files in the module + recent commit history.
  2. Answers each of the five questions in order.
  3. Emits structured output → feeds the writer agents.

Run in parallel across modules — Meta uses 11 module analyst agents simultaneously on one session.

Output shape

Answers map 1:1 to sections in the downstream compass-not-encyclopedia context file:

Question Context-file section
Q1 + Q2 Quick Commands
Q2 + Q4 Key Files
Q3 + Q5 Non-Obvious Patterns — the highest-value section
Q4 See Also

This alignment is not accidental — the five-questions framework is designed to feed the four-section file format.

Contrast with documentation approaches

Approach Primary question Output shape
Divio framework (tutorials / how-to / reference / explanation) "How do I teach this?" Four doc categories
Javadoc / RustDoc / pydoc "What does this method do?" API-surface docs
README-driven development "How would someone adopt this?" Adoption-oriented narratives
Five-questions framework (this) "What breaks if the agent doesn't know this?" Failure-oriented navigation files

The five-questions framework is distinctive in orienting around failure modes — questions 3 and 5 both target knowledge whose absence produces silent wrong output.

Tradeoffs

  • Lossy by design — the framework skips architecture / design- history / aesthetic axes. Suitable for agent context, not for learning the system as a human.
  • Requires mature code + commits + comments — on a newly written module, Q5 returns empty. The framework is extraction, not generation.
  • Q3 is the hardest question — reliably identifying "non-obvious patterns that cause build failures" requires either an analyst with deep context or a large-context model that has read adjacent failure postmortems.

Applicable beyond Meta's case

  • Onboarding docs for new engineers — same failure-first shape, but answered by humans.
  • Migration guides — the five questions applied to the migrating codebase produce the pre-migration invariant list.
  • Runbooks for operators — Q3 ("what breaks") is the runbook's core; Q1 + Q2 set context.

Meta's fifth apply-it-yourself step specifically names the framework as reusable: "Use the 'five questions' framework. Have agents (or engineers) answer: what does it do, how do you modify it, what breaks, what depends on it, and what's undocumented?"

Seen in

Last updated · 319 distilled / 1,201 read