---
title: How We Cut up to 80% of Engineering “Chores” Using AI Agents in Jira
source: Atlassian Engineering
source_slug: atlassian
url: https://www.atlassian.com/blog/development/ai-agents-jira-engineering-maintenance
published: 2026-06-01
fetched: 2026-06-02T14:01:23+00:00
ingested: true
---

Our Jira engineering team was spending more time than we’d like focused on KTLO (keeping the lights on) tasks – the small, but important maintenance tasks nobody _wants_ to spend time on. This includes work like cleaning up old feature flags, chasing flaky tests, fixing identified vulnerabilities, addressing accessibility issues, and chipping away at a long tail of bugs.

As a team we wanted to shift our time towards enabling our engineering org to move faster with best-in-class tools, not just keep tech debt at bay. So we devised a way to use agents, Jira, and workflows to cut up to 80% time on key automated repetitive tasks.

Jira is the heart of our strategy. Each work item is a record of the tasks we need to complete and acts as a prompt for agents. All the context an agent needs is shared using the work item, [Atlassian’s Teamwork Graph](https://www.atlassian.com/platform/teamwork-graph), and the explicit instructions we include in our workflow automations.

Our team has spent years fixing these exact categories of issues. That pattern recognition is what makes delegation to agents possible. We know what good cleanup looks like, so we can define clear parameters, build review checkpoints, and design a human-in-the-loop system that produces code meeting our standards.

For our team, Jira isn’t just where work is tracked. It is where agents are given context, assigned work, and where our team controls what agentic work is actually merged into our codebase.

Here are two examples of applying this framework to automating some of our engineering “chores”:

## How to fix flaky tests faster with AI agents in Jira

**Flaky tests** are one of those maintenance problems that look small in isolation, but create real drag over time. They interrupt builds, reduce trust in CI, slow down delivery, and pull engineers away from product work.

Our team previously spent two hours resolving a flaky test. We encountered roughly one flaky test per day, sometimes more. Each required an engineer to inspect the CI failure, reproduce the issue locally or in CI-like conditions, determine if the problem was in the test or product code, and prepare a fix.

Now that we’ve implemented agentic workflows with Jira, we save roughly one engineering week every month, which means we’ve reduced eng hours spent on flaky tests by up to 80%.

### How we give agents the right context

To reduce that manual work, our team analyzed the flaky test problems we had handled over time. We looked for common root causes and repeatable fix patterns: asynchronous timing issues, race conditions, unstable test setup, unreliable mocks, page state problems, visual rendering differences, and other patterns that appeared again and again.

We then turned those learnings into **reusable agent skills**. Rather than using one generic workflow for every flaky test, the skill can look at the type of test involved and apply specialised instructions for that category.

For example:

  * **Unit test** specialist: focuses on asynchronous timing issues, mocks, fake timers, and test isolation.
  * **Integration test** specialist: focuses on browser automation issues, network races, page stability, and environment setup.
  * **Visual regression** specialist: focuses on deterministic rendering, snapshot updates, image diffs, and visual test stability.


To ensure our agents can diagnose the correct issue, each skill also includes reproduction instructions. For example, our agents can run the failing test repeatedly under slower or CPU-throttled conditions to mimic CI condition as closely as possible. This helps the agent reproduce intermittent failures that may not show up during a single local test run.

### Automating issue triage using Jira’s workflows

When a ticket is created, our workflow starts by delegating triage to an agent to verify whether the issue is real using a custom prompt. If it looks like a false positive, the agent can stop and summarise that outcome, commenting on the original Jira work item. That way the engineer reviewing the ticket can quickly get a sense of what the agent did and what the agent discovered without digging.

If the problem is reproducible, the agent applies the relevant fix pattern, prepares a code change, leaves a comment for the engineering team and opens a draft pull request for an engineer to review.

The key is that the agent handles the repetitive first pass: investigation, diagnosis, and a proposed fix. Engineers validate the change before it is merged.

What used to require hours of manual investigation can now become minutes of review.

![](https://atlassianblog.wpengine.com/wp-content/uploads/2026/06/workflow-td-1-462x1400.png)

## How we automate stale feature flags cleanup with AI agents

Feature flags are invaluable for gradual rollouts and safe experimentation; however, if the code they gate is not regularly kept up-to-date, dead code accumulates and impacts performance, reliability, and developer productivity.

On a large, multi-product codebase, cleanup is harder than “just removing the code.” A flag might be fully rolled out for some customers but still active for others due to compliance requirements, release tracks, or experiment holdouts. Piecing together a flag’s true state across systems was manual and error-prone. The work was tedious, time-consuming, and repetitive — perfect for agents.

We built a system in Jira to automate the bulk of this work. So far, it’s responsible for more than 500 merged PRs in the past 70 days.

### How to use Jira work items as prompts

Using our experience as a guide, we created a heuristic to identify stale feature flags in the code base, and capture the context needed for an agent to begin. We run a daily cron job that creates and updates Jira work items for every stale flag including:

  * **Flag name and type:** the unique identifier of the flag, along with what kind of flag it is (rollout gate, experiment…)
  * **Repository and code references:** the exact repo, file paths, and line numbers where the flag appears.
  * **Desired final state:** what the code should look like once the flag is removed. For example, for a rollout gate, this is typically the “on” or “off” branch to preserve. For experiments, it may be the winning cohort, a specific variant’s behavior, or a custom path defined by the experiment owner.


### Delegating work items to agents in Jira

To ensure accountability and code quality we’ve designed a human-in-the-loop system. An engineer can review the work items created, then delegate the work to an agent by updating the status – triggering a workflow that passes a [custom system prompt to the agent with cleanup instructions](https://support.atlassian.com/jira-software-cloud/docs/collaborate-on-work-items-with-ai-agents/#Add-an-agent-to-workflow-transitions).

Atlassian hosts thousands of repositories owned by hundreds of teams, each with their own codebases and conventions so a “one-size-fits-all” doesn’t work. We’ve encoded our cleanup experience into repository-specific agent skills, and the system prompt gives each agent a clear fallback path:

  1. If available, use the repository’s existing cleanup procedure that gives the agent purpose-built guidance for that codebase
  2. Flag repositories that could benefit from a dedicated skill, and provide the repo owners with instructions to generate a cleanup procedure
  3. Fallback to a generic cleanup skill that works across most codebases


Now, every cleanup results consistently in a high quality PR and future instructions in continuously improve agent decision making.

## What we’ve learned

The pattern behind both examples is the same: take work your team already knows how to do well, encode that knowledge into structured Jira work items, agent instructions, and agent skills, and let agents handle the first pass while engineers stay in control of what ships. If your team is spending engineering hours on work that follows a repeatable pattern, that’s a signal you have an opportunity to implement agentic automations.