Skip to content

PATTERN Cited by 1 source

Agent spawn parallel exploration

Problem

Performance-engineering-class work — finding hotspots in a large codebase, proposing optimisations to Rust/C/C++/Go hot paths — is hypothesis-limited at the early-exploration phase. An engineer knows the codebase is slow somewhere but doesn't know where; a single-agent session is biased toward wherever the first prompt points the agent, and individual agents hyperfixate (concepts/agent-hyperfixation-failure-mode) on their first hypothesis.

At the same time, supervising a single agent tightly limits iteration throughput — an engineer can only review one stream of agent output at a time.

The pattern

Fan out N unattended agents, each with a prompt variation targeting a different area of the hypothesis space. Review results asynchronously — next morning, on break, after a meeting — extract the subset that survives end-to-end reality checks, discard the rest.

The tolerance for high failure rate is load-bearing: because agents run unattended (no marginal human cost per agent), a ~60 % failure rate is fine if survivors are independently useful.

Canonical wiki instance

From Anthony Shew's 2026-04-21 Turborepo post verbatim:

"I spun up 8 background coding agents from my phone before bed, each targeting a different part of the Rust codebase I suspected was too slow.

'Look for a performance speedup in our Rust code. It has to be something that is well-tested, and on our hot path. Make sure to add benches to check your work. I'm particularly interested in our hashing code.'

In each prompt, I replaced the part of the codebase I was interested in with a new target. I was curious what the agents would accomplish with plenty of ambiguity, as a baseline.

By morning, 3 of the 8 had produced outputs that I could turn into shippable wins."

The three survivors:

  • PR #11872 — ~25 % wall-clock reduction from hashing by reference instead of cloning a HashMap.
  • PR #11874 — ~6 % win from swapping twox-hashxxhash-rust.
  • PR #11878 — replacement of an unnecessary Floyd-Warshall algorithm with multi-source DFS (not on the hot path but still a shippable improvement).

3 of 8 = ~37 % yield is the canonical unattended- agent baseline datum at the prompt quality documented here.

Prompt-variation strategy

Each of the 8 prompts was the same base template with one slot swapped:

  • hashing code (as in the verbatim example)
  • other variant areas of the codebase Shew suspected were too slow

The variations create independent first-hypothesis generations so the 8 agents don't all converge on the same conclusion. This is a deliberate mitigation for hyperfixation — a single agent will commit to its first idea, but 8 agents with different slot-fills commit to 8 different ideas, and the engineer picks from the survivors.

Properties

  • Parallel-exploration, serial-review. All 8 agents run simultaneously; review happens when the human returns (next morning).
  • Failure-tolerant. 60 % failure rate is fine because the human discards bad output. The only cost of a failed agent is the compute spent.
  • Prompt-hypothesis coupled. Quality of survivors scales with prompt quality + codebase knowledge. Shew's "area of codebase I suspected was too slow" is doing real work — random areas would have lower yield.
  • Low supervision cost. No per-iteration human gate; review is batch-mode after all agents finish.

Limits / failure modes

The 5 unattended-agent failure modes that produce the 62 % failure rate (from the same post's retrospective):

  1. No dogfood-loop awareness.
  2. Hyperfixation on first idea.
  3. Microbenchmark chasing (97 % microbench / 0.02 % real-world).
  4. No regression tests written.
  5. No --profile flag usage — proposing hashing optimisations without profiling the actual binary.

Not all failures exhibit all five; the 5 of 8 non- shippable agents exhibited some mix.

Not a substitute for supervised iteration

Shew's own conclusion verbatim: "The agents running unattended produced some good wins, but I could tell this wouldn't be sustainable. We needed stronger testing, and a better verification loop. I had to be more involved."

Parallel exploration is a one-time hypothesis generator for early-phase exploration; once the hot path is identified, supervised Plan-Mode-then-implement is a better execution pattern.

The two compose: spawn-parallel for "where are the hot paths", then supervised-loop for "implement optimisations on the identified hot paths".

Composition hint

  • Overnight window. 8 hours of unattended execution makes efficient use of sleep time; review is 30-60 minutes in the morning.
  • Phone-spawnable harnesses. Modern agent harnesses (Claude Code, Codex, some Cursor configurations) support remote-triggered background sessions, making the "from my phone before bed" framing load-bearing — no laptop open required.
  • Prompt templates + a CSV of slot-fills. Keeps the variation discipline explicit and allows systematic coverage of the hypothesis space.

Relation to other exploration patterns

  • patterns/multi-candidate-generation (ML- domain; recsys pattern at model-output altitude) — sibling at a different altitude. Both fan out generations to explore hypothesis space then select survivors.
  • patterns/ai-generated-fix-forward-pr — Meta's fix-forward-PR pattern for automated vulnerability remediation; different validation gate (security fix correctness) but similar pattern shape.

Anti-patterns

  • Spawn N identical-prompt agents. Identical prompts converge on identical first-hypotheses; hyperfixation returns in aggregate. Vary the prompts.
  • Spawn without end-to-end validation plan. Unvalidated agent outputs are the microbench-optimisation pathology at 8× scale.
  • Spawn for production-critical execution. Parallel exploration is for hypothesis generation; execution on hot-path code in production systems needs supervised iteration.
  • Take all 3-of-8 survivors at face value. The Floyd-Warshall replacement (PR #11878) was off-the-hot-path"not on the hot path of turbo run, but my prompts didn't specify which hot path, did they? Fair." Survivors still need end-to-end validation.

Seen in

  • Making Turborepo 96 % faster (Vercel, 2026-04-21) — canonical wiki instance; 8 agents spawned overnight; 3 shippable (37 % yield); prompt- variation discipline explicit; five unattended failure modes reviewed.
Last updated · 476 distilled / 1,218 read