Skip to content

ATLASSIAN 2026-05-22

Read original ↗

Atlassian — From Ambiguous Questions to Action: Research Mode in Rovo Dev CLI

Summary

A 2026-05-22 Atlassian Engineering blog post introducing Research Mode — a structured multi-agent investigation workflow inside the Rovo Dev CLI designed for "questions that are bigger than 'edit this file'": cross-source investigations spanning Jira work history, code in Bitbucket, product and architectural context in Confluence, and decision history in pull requests. The architectural contribution is a research-as-workflow pattern with five named steps: (1) analyze the query to identify the global objective and the decision the research should support; (2) decompose into 2-4 focused research domains; (3) pick a search direction per domain — top-down (start broad and narrow) or bottom-up (start in code or PRs and trace outward); (4) delegate to domain subagents — each with a clearly scoped objective and a bounded set of sources, reporting back to the main workflow without taking actions or making changes; (5) run another round if needed for sequential dependencies (e.g. first find causes, then research mitigations); (6) synthesise a final report"merge findings by theme, resolve overlaps, cite sources, and recommend next steps" — and (7) hand off to action options (create Jira epics/tasks, export deeper reports, share a Confluence summary, create follow-up Confluence sub-pages). The post frames the design move as "from better prompts to a research workflow" — prompt-tuning helped Rovo Dev "ask better questions, search more sources, and write cleaner summaries", but "the core problem was not just prompt quality. It was research strategy." Research Mode borrows from the explicit research process (define question → split into subquestions → gather evidence → compare findings → synthesise) and maps subquestions to research domains as the unit of subagent dispatch. The direction-per-domain lever is novel: rather than searching every source the same way, "the agent has a deliberate search sequence instead of searching every source the same way" — top-down for domains where broad context narrows to specifics, bottom-up for domains where code/PR ground truth must be established before tracing back to product/architectural context. Direction is not a hard boundary — "a domain can use both top-down and bottom-up research when needed". Permission-scoped read-only investigation is the second load-bearing design property: "Built on the permissions you already have, Research Mode only reads from the sources a developer already has access to. It runs on demand when a developer invokes it for a specific question, not in the background, and not across teammates' private work. The agent's scope is bounded by the same access controls that govern Jira, Confluence, and Bitbucket today." — i.e. agents do not amplify permissions, do not run continuously, and do not act on the developer's behalf without explicit follow-up. Quantitative outcome: 84% of users who invoked Research Mode rated the experience as helpful, with the appropriate caveat that this "is not causal proof, since Research Mode is often used for different kinds of tasks than standard Rovo Dev sessions." Five named lessons close the post: (a) discovery is a first-class developer workflow"many valuable engineering tasks start before implementation"; (b) search direction matters — top-down and bottom-up uncover different evidence; (c) subagents need clear domain boundaries"parallelism helps only when each agent has a focused scope"; (d) some research needs multiple rounds"cause and mitigation, problem and solution, or context and recommendation are often sequential"; (e) the final artifact matters"a report with themes, references, and next actions is more useful than raw search results". The closing thesis: "The next frontier for AI developer tools isn't writing more code for developers. It's helping developers cut through the noise of fragmented tools so they can focus on the judgment calls only they can make." This post is the first-party Atlassian disclosure of Rovo Dev's investigation-mode architecture and is the canonical wiki instance for the research-domain-decomposition pattern, the top-down-vs-bottom-up search-direction-per-domain lever, and the dependency-sequenced multi-round investigation shape.

Key takeaways

  1. Discovery is a first-class developer workflow, not pre-coding overhead. "Many valuable engineering tasks start before implementation." The pain points the post enumerates are all investigation tasks that precede any code change: "Which work items are ready for automation? Why does this problem happen, and what mitigations exist? Which bugs or security issues keep resurfacing across work items? What decision led to this architecture? What evidence do we have for a launch or roadmap decision?" These are not "edit this file" tasks — they are tasks where "the hard part is figuring out where to look: Jira for work history, code for implementation truth, Confluence for product context, or pull requests for how decisions landed." Treating discovery as a first-class workflow (rather than as the developer's responsibility before they invoke an agent) is the load-bearing reframing of what AI dev tooling should do. Canonical wiki instance of concepts/discovery-as-developer-workflow (Source).

  2. Decomposing an ambiguous question into 2-4 research domains is the unit of work, not the unit of query. "We borrowed from how research is normally done: define the question, split it into subquestions, gather evidence, compare findings, and write a synthesis. In Research Mode, those subquestions become research domains." The architectural move: a research domain is a focused area of investigation with a clearly scoped objective and a bounded set of sources to consult — "split the work into 2-4 focused areas". The 2-4 bound is not arbitrary: it caps the parallelism (and therefore the synthesis cost) at a level where the supervisor can compare and merge findings without overflow. Each domain becomes a subagent dispatch unit: a "focused investigator with a clearly scoped objective and a bounded set of sources to consult". Canonical wiki instance of concepts/research-domain-decomposition — the named operational unit for ambiguous-query decomposition (Source).

  3. Search direction is a per-domain decision, not a global one. "For some domains, the right path is top-down... For other domains, the right path is bottom-up: start in code or pull requests to understand what the system actually does, move into Jira to understand the work history, then trace back to Confluence to find the broader product or architectural context. The direction is not a hard boundary. A domain can use both top-down and bottom-up research when needed. The important part is that the agent has a deliberate search sequence instead of searching every source the same way." The novel architectural lever: search direction is a property of the research domain, not a property of the agent or the system. Top-down domains start broad and narrow (e.g. Confluence → Jira → code); bottom-up domains start at ground truth (code/PRs) and trace outward to context (Jira work history → Confluence product/architecture). This lets the agent avoid redundant searches — domains with clear specs benefit from top-down, domains where "what the system actually does" differs from documentation benefit from bottom-up. Canonical wiki instance of concepts/top-down-vs-bottom-up-research-direction — the "deliberate search sequence" lever for multi-source investigation (Source).

  4. Subagents need clear domain boundaries — parallelism without scope is noise. "Subagents need clear domain boundaries. Parallelism helps only when each agent has a focused scope." This is the same architectural argument Cloudflare's vulnerability-discovery harness makes for coverage on large repos: a single exhaustive agent fails because of context-rot and tool-selection noise; many narrow agents in parallel succeed because each one's scope fits its context window and tool inventory. Atlassian's framing extends this from coverage-style problems (find every vulnerability) to investigation-style problems (answer one ambiguous research question). The principle is the same: scope before parallelism. The 2-4 domain bound supports this — three or four focused investigators is a manageable supervisor problem; ten or twenty independent searches is not. Direct sibling of patterns/parallel-narrow-agents-over-exhaustive applied to research workflows (Source).

  5. Some research needs multiple rounds because dependencies are sequential. "Some questions are sequential: first find causes, then research mitigations." And: "cause and mitigation, problem and solution, or context and recommendation are often sequential." The named pattern: dependency-sequenced multi-round research — when a second-stage investigation requires findings from the first stage as input, parallelism doesn't help and the workflow needs a serial barrier between rounds. The article's mermaid flow diagram explicitly models this: Round 1 produces "findings" → a decision branch "Need dependency follow-up?" → if yes, Round 2 "e.g. mitigation after cause" → final report. The architectural primitive is a supervisor-controlled pause between rounds where the synthesis from round 1 becomes the seed input for round 2's domain decomposition. This generalises beyond two rounds — though the post only names two — and is structurally distinct from the parallel-only fan-out shape covered in earlier multi-agent retrospectives. Canonical wiki instance of concepts/dependency-sequenced-multi-round-research (Source).

  6. The final artifact is a synthesis report with next-actions, not raw search results. "The final artifact matters. A report with themes, references, and next actions is more useful than raw search results." And from the workflow description: "Synthesize a final report. Merge findings by theme, resolve overlaps, cite sources, and recommend next steps." And: "After the report, Research Mode presents next-step options: create Jira epics and tasks for each workstream, export deeper reports for specific topics, share a summary on Confluence, or create follow-up Confluence sub-pages when a topic needs more detail." The architectural property: the output schema of Research Mode is not "here are the search results" — it's "here is a themed synthesis with citations and a list of next actions". The synthesis step is itself agent work — merging across domains, resolving overlaps, citing back to sources — and is the load-bearing differentiator between an investigation agent and a search agent. The next-actions surface (Jira epic, deeper report export, Confluence page, sub-page) is the action-handoff layer that keeps the agent read-only while making it easy for the human to convert findings into work. Canonical wiki instance of patterns/synthesis-over-raw-search-results (Source).

  7. Read-only, on-demand, permission-scoped — three independent design properties. "Built on the permissions you already have, Research Mode only reads from the sources a developer already has access to. It runs on demand when a developer invokes it for a specific question, not in the background, and not across teammates' private work. The agent's scope is bounded by the same access controls that govern Jira, Confluence, and Bitbucket today." And: "Subagents report back to the main workflow, they don't take actions or make changes on their own. Developers review the synthesised report before any follow-up steps (creating a Jira ticket, publishing a Confluence page, etc.) are taken." Three orthogonal properties: (a) read-only — subagents cannot mutate state; (b) on-demand — runs only when invoked, not as a background poller; (c) permission-scoped — bounded by the user's existing ACLs across Jira, Confluence, and Bitbucket — "not across teammates' private work". The composition of these three is the safety envelope that lets the agent operate over a sensitive cross-source corpus (work history, product context, code) without amplifying permissions or running continuously. The action-handoff layer (next-step options after the report) is the deliberate break between the read-only research phase and any state-mutating follow-up — the human gets to review the synthesis before any Jira ticket is created or Confluence page is published. Canonical wiki instance of patterns/read-only-on-demand-investigation-agent (Source).

  8. The architectural shape is supervisor + per-domain subagents + cross-domain synthesis — not a single agent loop. The mermaid flow makes this explicit:

    Ambiguous question
    Research domains (2-4)
    Domain A: top-down ─┐
    Domain B: bottom-up ─┼─→ Round 1 findings
    Domain C: top-down ─┘
    {Need dependency follow-up?}
        ├── Yes → Round 2: e.g. mitigation after cause ─┐
        └── No                                          │
                                                    Final report
                                            Final report + references
                              Jira tickets / deep-dive reports / Confluence pages
    

    The supervisor is the named coordinator that decomposes, dispatches, runs the round-2 decision, and synthesises. The subagents are the focused investigators with bounded sources. The synthesis is a separate agent step, not a folded-into-supervisor responsibility. This is structurally similar to multi-agent supervisor routing but with two differences: (i) the subagents are collaborators (their findings are merged) not alternatives (one is selected); (ii) there is an explicit second round gate, not a single-shot dispatch. Canonical wiki instance of patterns/domain-decomposed-research-workflow (Source).

  9. 84% rated helpful — but the post explicitly disclaims causal interpretation. "In sessions where users invoked Research Mode, more than 84% rated the experience as helpful. We do not treat this as causal proof, since Research Mode is often used for different kinds of tasks than standard Rovo Dev sessions. Still, it is an encouraging signal that structured research makes Rovo Dev more effective when users need serious context investigation before acting." Two notable disclosures: (a) the single-number outcome metric is helpfulness rating, not task-completion or time-saved — the post does not claim "X% faster investigations" or "Y fewer Jira tickets to file the right work"; (b) the selection-bias caveat is explicitly acknowledged — users who invoke Research Mode are choosing it for tasks they think it suits, so the helpfulness rating reflects fit-to-task as much as agent quality. The discipline of disclosing the caveat is itself the architectural signal: this is a measured A/B-aware retrospective, not a marketing-claim post (Source).

  10. The closing thesis reframes AI dev tooling as a fragmentation problem, not a code-volume problem. "The next frontier for AI developer tools isn't writing more code for developers. It's helping developers cut through the noise of fragmented tools so they can focus on the judgment calls only they can make." The architectural framing is information integration before code generation: across Jira (work history), Bitbucket (code + PRs), Confluence (product/architectural context), the fragmentation of where evidence lives is the load-bearing problem — and a developer's time is better spent on judgement calls (which fix is right, what should be automated, which decision actually shipped) than on information assembly (where does this evidence live, who can answer this). Research Mode is positioned as the architectural answer to fragmentation rather than as a new code-generation primitive. This positions the agent against discovery work rather than against implementation work — a structural complement, not a competitor (Source).

Architectural numbers + operational notes (from source)

  • Research domain count: 2-4 focused research domains per ambiguous question (verbatim: "Split the work into 2-4 focused areas"). The bound is small enough that the supervisor's synthesis problem stays tractable.
  • Search-direction options per domain: top-down (broad → narrow, typically Confluence → Jira → code) or bottom-up (ground truth → context, typically code/PRs → Jira → Confluence). Direction is not a hard boundary; "a domain can use both top-down and bottom-up research when needed."
  • Round count: 1 or 2 rounds per investigation. Round 2 is invoked only when "dependencies exist""first find causes, then research mitigations" is the canonical example. The article does not name a 3+ round shape, though the mermaid flow does not formally exclude one.
  • Permission-scope rule: "Research Mode only reads from the sources a developer already has access to." — bounded by Jira, Confluence, and Bitbucket access controls. "Not in the background, and not across teammates' private work."
  • Action-handoff options after synthesis (verbatim list): "create Jira epics and tasks for each workstream", "export deeper reports for specific topics", "share a summary on Confluence", "create follow-up Confluence sub-pages when a topic needs more detail".
  • Outcome metric: 84% helpfulness rating in sessions where Research Mode was invoked. The post explicitly disclaims this as "not causal proof" due to selection bias.
  • Mermaid flow diagram in source shows the supervisor → 2-4 domain subagents (with per-domain top-down/bottom-up tag) → Round 1 findings → dependency-follow-up decision → optional Round 2 → final report → action-options handoff. This is the canonical reference architecture for the pattern.
  • No internal architecture disclosure: the post does not name the LLM(s) used, the model-routing strategy, the per-domain context budget, the synthesis-step prompt structure, the latency envelope per domain, or the cost envelope per session. The disclosure is at the workflow-shape altitude — what the components are and how they compose — not at the implementation altitude.

Systems extracted

New wiki page:

  • systems/rovo-dev-research-mode — the named structured-investigation feature inside Rovo Dev CLI. Composes a supervisor (decomposes the ambiguous question, dispatches subagents, runs the round-2 gate, synthesises) + 2-4 per-domain investigator subagents (each with a top-down or bottom-up search direction) + a synthesis step (themed merge, citation, next-actions). Read-only, on-demand, permission-scoped. Canonical first-party Atlassian disclosure of Rovo Dev's investigation-mode architecture.

Extended (cross-link added):

  • systems/rovo-dev — adds Research Mode as a named structured-investigation feature alongside the existing implementation-loop features (skills, sub-agents, prompt shortcuts like !review-pr, PR-bot comments). Reinforces the "product-integrated end-to-end SDLC coverage" property by extending the agent surface from build/edit-the-file to investigate-before-acting. The mermaid-flow shape (supervisor + per-domain subagents + dependency-sequenced rounds + synthesis report + action-handoff) is the canonical multi-agent shape on Atlassian's agent platform.
  • systems/jira — adds Research Mode as a read-only consumer of Jira work-history data, bounded by user permissions; the action-handoff layer creates Jira epics/tasks for synthesised workstreams. Reinforces Jira as a load-bearing source of work-history context for cross-source investigation.
  • systems/bitbucket — adds Research Mode as a read-only consumer of Bitbucket code + PR data, especially for bottom-up search direction ("start in code or pull requests to understand what the system actually does"). PRs are explicitly named as a primary source for "how decisions landed."

Concepts extracted

New wiki pages:

  • concepts/research-domain-decomposition — the named operational unit for ambiguous-query decomposition: split an ambiguous question into 2-4 focused research domains, each with a clearly scoped objective and a bounded source set. The 2-4 bound caps parallelism and synthesis cost. Each domain becomes a subagent dispatch unit. Distinct from generic specialised agent decomposition by being per-query (not per-domain-of-the-product) and bounded in count.
  • concepts/top-down-vs-bottom-up-research-direction — the per-domain choice of search direction: top-down (broad → narrow, typically Confluence → Jira → code) or bottom-up (ground truth → context, typically code/PRs → Jira → Confluence). Direction is a property of the research domain, not of the agent or system. The architectural lever: "the agent has a deliberate search sequence instead of searching every source the same way."
  • concepts/dependency-sequenced-multi-round-research — the named investigation shape where a second-stage investigation requires findings from the first stage as input. Canonical example: "first find causes, then research mitigations." The supervisor pauses between rounds; the round-1 synthesis becomes the seed for round-2 domain decomposition. Distinct from purely-parallel multi-agent fan-out by having an explicit serial barrier.
  • concepts/discovery-as-developer-workflow — the framing of discovery / investigation as a first-class developer workflow, distinct from implementation. The pain points Research Mode addresses ("Which work items are ready for automation? Why does this problem happen, and what mitigations exist? What decision led to this architecture?") are investigation tasks that precede code changes. Reframes AI dev tooling away from "writing more code" toward "helping developers cut through the noise of fragmented tools so they can focus on the judgment calls only they can make."

Extended (cross-link added):

  • concepts/specialized-agent-decomposition — adds Research Mode as a per-query instance of specialised-agent decomposition: domains are not pre-defined storage/database/network specialists; they are decomposed at query time from the ambiguous question. Reinforces that decomposition can be dynamic (per-query) or static (per-domain-of-the-product) and the two are structurally similar but operationally distinct.
  • concepts/agentic-troubleshooting-loop — adds Research Mode as a multi-domain sibling of the single-agent troubleshooting loop: where the troubleshooting loop is one LLM iterating against one telemetry surface, Research Mode is one supervisor decomposing across multiple sources (Jira + Confluence + Bitbucket) with per-domain subagents. Both share the "investigation-before-action" shape; Research Mode generalises to multi-source.

Patterns extracted

New wiki pages:

  • patterns/domain-decomposed-research-workflow — the canonical multi-agent investigation pattern named by Research Mode: ambiguous question → 2-4 research domains (per-domain top-down or bottom-up direction) → Round 1 findings → dependency-follow-up gate → optional Round 2 → synthesis report → action-handoff options. The supervisor is the named coordinator; subagents are the focused investigators; synthesis is a separate agent step. Canonical instance: Rovo Dev Research Mode.
  • patterns/synthesis-over-raw-search-results — the design property that the output schema of an investigation agent is a themed synthesis with citations and next-actions, not a list of raw search results. Synthesis is its own agent step (merge by theme, resolve overlaps, cite sources, recommend next steps). The next-actions surface (e.g. "create Jira epic", "share Confluence page") is the action-handoff layer that keeps the agent read-only while making outcomes actionable.
  • patterns/read-only-on-demand-investigation-agent — three orthogonal design properties composed into a single safety envelope: read-only (subagents cannot mutate state) + on-demand (invoked per-question, not background poller) + permission-scoped (bounded by the user's existing ACLs across all sources). The composition is what lets the agent operate over sensitive cross-source corpora without amplifying permissions or running continuously. The action-handoff layer is the deliberate break between the read-only phase and any state-mutating follow-up.

Extended (cross-link added):

  • patterns/parallel-narrow-agents-over-exhaustive — adds Research Mode as the investigation-task sibling of the coverage-task pattern. Cloudflare's vulnerability-discovery harness uses many narrow agents to achieve coverage of a code surface; Atlassian's Research Mode uses few narrow agents (2-4) to achieve answer quality on one ambiguous question. The shared principle — "scope before parallelism" — applies across both task classes.
  • patterns/multi-agent-supervisor-routing — adds Research Mode as a collaborative sibling of the routing pattern. In supervisor routing, the supervisor selects one sub-agent per query (alternatives, not collaborators). In Research Mode, the supervisor dispatches all domain subagents and merges their outputs (collaborators, not alternatives). Both share the named-supervisor + named-subagents shape; the difference is selection-vs-merge.
  • patterns/parallel-subagent-execution-for-latency — adds Research Mode as a sibling pattern with an additional dependency-sequenced second round. Pure parallel fan-out (KYC validation case) optimises for max(sub-agent-latency) instead of sum(sub-agent-latency); Research Mode does the same for round 1 but adds a serial barrier when round-2 depends on round-1 findings (cause → mitigation), trading latency for completeness in dependency-sequenced cases.

Caveats + open questions

  • No internal architecture disclosure: the post is at the workflow-shape altitude — what the components are and how they compose — but does not disclose the LLM(s) used, model-routing strategy, per-domain context budget, synthesis-step prompt structure, latency envelope per domain, cost envelope per session, or the supervisor's decision logic for "need round 2?".
  • 84% helpfulness has explicit selection-bias caveat: users self-select into Research Mode for tasks they think it suits; the rating reflects fit-to-task as much as agent quality. The post does not disclose any baseline comparison (e.g. helpfulness rating for the same task class without Research Mode).
  • Synthesis step quality is not measured: the post claims synthesis matters more than raw results but does not disclose how synthesis quality is evaluated. The end-of-post action-handoff (Jira epic, Confluence page) implicitly assumes the synthesis is good enough to act on; the post does not surface a synthesis-rejection or synthesis-revision flow.
  • Round count is named at 2 but not bounded: "Some questions are sequential: first find causes, then research mitigations." The mermaid flow diagram shows a single follow-up round; the post does not address whether 3+ rounds are supported, or what the supervisor's stopping criterion is.
  • No comparison to single-agent baseline: the post motivates Research Mode against "prompt tuning helped, but the core problem was research strategy" but does not quantify the gap between a well-tuned single-agent loop and Research Mode on the same investigation tasks.
  • Permission-scope assumes correct ACL configuration upstream: Research Mode is bounded by the user's existing Jira/Confluence/Bitbucket ACLs, which means investigations inherit any over-permissive ACL configuration in those upstream systems. The post does not address this composition risk.
  • Tier-3 source caveat: this is an Atlassian Engineering blog post about a Rovo Dev product feature; included on the wiki because it has substantive workflow-architecture content (mermaid flow diagram, named multi-step decomposition, explicit design rationale, named lessons) rather than being a marketing case study. The architectural content is at the workflow-pattern altitude, not the production-infra-at-scale altitude — a future post disclosing the synthesis-step prompts, the round-2 decision model, the per-domain LLM routing, or the production latency/cost envelope would be a high-value follow-up ingest.

Source

Last updated · 542 distilled / 1,571 read