PATTERN Cited by 1 source
Agent spawn parallel exploration¶
Problem¶
Performance-engineering-class work — finding hotspots in a large codebase, proposing optimisations to Rust/C/C++/Go hot paths — is hypothesis-limited at the early-exploration phase. An engineer knows the codebase is slow somewhere but doesn't know where; a single-agent session is biased toward wherever the first prompt points the agent, and individual agents hyperfixate (concepts/agent-hyperfixation-failure-mode) on their first hypothesis.
At the same time, supervising a single agent tightly limits iteration throughput — an engineer can only review one stream of agent output at a time.
The pattern¶
Fan out N unattended agents, each with a prompt variation targeting a different area of the hypothesis space. Review results asynchronously — next morning, on break, after a meeting — extract the subset that survives end-to-end reality checks, discard the rest.
The tolerance for high failure rate is load-bearing: because agents run unattended (no marginal human cost per agent), a ~60 % failure rate is fine if survivors are independently useful.
Canonical wiki instance¶
From Anthony Shew's 2026-04-21 Turborepo post verbatim:
"I spun up 8 background coding agents from my phone before bed, each targeting a different part of the Rust codebase I suspected was too slow.
'Look for a performance speedup in our Rust code. It has to be something that is well-tested, and on our hot path. Make sure to add benches to check your work. I'm particularly interested in our hashing code.'
In each prompt, I replaced the part of the codebase I was interested in with a new target. I was curious what the agents would accomplish with plenty of ambiguity, as a baseline.
By morning, 3 of the 8 had produced outputs that I could turn into shippable wins."
The three survivors:
- PR #11872
— ~25 % wall-clock reduction from hashing by
reference instead of cloning a
HashMap. - PR #11874
— ~6 % win from swapping
twox-hash→xxhash-rust. - PR #11878 — replacement of an unnecessary Floyd-Warshall algorithm with multi-source DFS (not on the hot path but still a shippable improvement).
3 of 8 = ~37 % yield is the canonical unattended- agent baseline datum at the prompt quality documented here.
Prompt-variation strategy¶
Each of the 8 prompts was the same base template with one slot swapped:
hashing code(as in the verbatim example)- other variant areas of the codebase Shew suspected were too slow
The variations create independent first-hypothesis generations so the 8 agents don't all converge on the same conclusion. This is a deliberate mitigation for hyperfixation — a single agent will commit to its first idea, but 8 agents with different slot-fills commit to 8 different ideas, and the engineer picks from the survivors.
Properties¶
- Parallel-exploration, serial-review. All 8 agents run simultaneously; review happens when the human returns (next morning).
- Failure-tolerant. 60 % failure rate is fine because the human discards bad output. The only cost of a failed agent is the compute spent.
- Prompt-hypothesis coupled. Quality of survivors scales with prompt quality + codebase knowledge. Shew's "area of codebase I suspected was too slow" is doing real work — random areas would have lower yield.
- Low supervision cost. No per-iteration human gate; review is batch-mode after all agents finish.
Limits / failure modes¶
The 5 unattended-agent failure modes that produce the 62 % failure rate (from the same post's retrospective):
- No dogfood-loop awareness.
- Hyperfixation on first idea.
- Microbenchmark chasing (97 % microbench / 0.02 % real-world).
- No regression tests written.
- No
--profileflag usage — proposing hashing optimisations without profiling the actual binary.
Not all failures exhibit all five; the 5 of 8 non- shippable agents exhibited some mix.
Not a substitute for supervised iteration¶
Shew's own conclusion verbatim: "The agents running unattended produced some good wins, but I could tell this wouldn't be sustainable. We needed stronger testing, and a better verification loop. I had to be more involved."
Parallel exploration is a one-time hypothesis generator for early-phase exploration; once the hot path is identified, supervised Plan-Mode-then-implement is a better execution pattern.
The two compose: spawn-parallel for "where are the hot paths", then supervised-loop for "implement optimisations on the identified hot paths".
Composition hint¶
- Overnight window. 8 hours of unattended execution makes efficient use of sleep time; review is 30-60 minutes in the morning.
- Phone-spawnable harnesses. Modern agent harnesses (Claude Code, Codex, some Cursor configurations) support remote-triggered background sessions, making the "from my phone before bed" framing load-bearing — no laptop open required.
- Prompt templates + a CSV of slot-fills. Keeps the variation discipline explicit and allows systematic coverage of the hypothesis space.
Relation to other exploration patterns¶
- patterns/multi-candidate-generation (ML- domain; recsys pattern at model-output altitude) — sibling at a different altitude. Both fan out generations to explore hypothesis space then select survivors.
- patterns/ai-generated-fix-forward-pr — Meta's fix-forward-PR pattern for automated vulnerability remediation; different validation gate (security fix correctness) but similar pattern shape.
Anti-patterns¶
- Spawn N identical-prompt agents. Identical prompts converge on identical first-hypotheses; hyperfixation returns in aggregate. Vary the prompts.
- Spawn without end-to-end validation plan. Unvalidated agent outputs are the microbench-optimisation pathology at 8× scale.
- Spawn for production-critical execution. Parallel exploration is for hypothesis generation; execution on hot-path code in production systems needs supervised iteration.
- Take all 3-of-8 survivors at face value. The
Floyd-Warshall replacement (PR #11878) was
off-the-hot-path — "not on the hot path of
turbo run, but my prompts didn't specify which hot path, did they? Fair." Survivors still need end-to-end validation.
Seen in¶
- Making Turborepo 96 % faster (Vercel, 2026-04-21) — canonical wiki instance; 8 agents spawned overnight; 3 shippable (37 % yield); prompt- variation discipline explicit; five unattended failure modes reviewed.
Related¶
- patterns/plan-mode-then-implement-agent-loop — the sibling supervised pattern; compose sequentially.
- concepts/agent-hyperfixation-failure-mode — failure mode parallel prompt variation mitigates.
- concepts/microbenchmark-vs-end-to-end-gap — failure mode that survives into the survivors unless end-to-end validation gates.
- patterns/multi-candidate-generation — adjacent altitude; ML-model fan-out instance.
- patterns/ai-generated-fix-forward-pr — adjacent altitude; automated fix PR instance.