Skip to content

PATTERN Cited by 1 source

Agent skill with fallback chain

Intent

When an AI agent runs across many repositories with divergent conventions, give the agent a three-tier fallback chain for finding the right orchestration skill to apply:

  1. Repo-specific skill — purpose-built guidance for this codebase.
  2. Skill-creation prompt — if no repo-specific skill exists, flag the repo as needing one and emit owner-facing instructions for how to author it.
  3. Generic fallback skill — a baseline procedure that works across most codebases.

The middle tier is the architectural innovation: the agent's "no skill exists" failure mode is converted into operational signal that prompts the team to author one.

Canonical articulation

Atlassian's stale-feature-flag cleanup workflow:

*"Atlassian hosts thousands of repositories owned by hundreds of teams, each with their own codebases and conventions so a 'one-size-fits-all' doesn't work. We've encoded our cleanup experience into repository-specific agent skills, and the system prompt gives each agent a clear fallback path:

  1. If available, use the repository's existing cleanup procedure that gives the agent purpose-built guidance for that codebase
  2. Flag repositories that could benefit from a dedicated skill, and provide the repo owners with instructions to generate a cleanup procedure
  3. Fallback to a generic cleanup skill that works across most codebases

Now, every cleanup results consistently in a high quality PR and future instructions to continuously improve agent decision making."* (Source: sources/2026-06-01-atlassian-how-we-cut-up-to-80-of-engineering-chores-using-ai-agents-in)

Shape

   Agent invoked on work item (e.g. clean up flag X in repo R)
   ┌────────────────────────────────────────┐
   │ TIER 1: Look for repo-specific skill   │
   │ Path: <R>/.agent-skills/cleanup.md     │
   │     (or wherever skills live in R)     │
   └────────────────────────────────────────┘
       found ─┴─ not found
        │             │
        │             ▼
        │      ┌─────────────────────────────────────┐
        │      │ TIER 2: Skill-creation prompt       │
        │      │  - Flag repo R as needing a skill    │
        │      │  - Comment on work item with         │
        │      │    instructions for repo owner to    │
        │      │    author cleanup.md                 │
        │      │  - Continue with TIER 3              │
        │      └─────────────────────────────────────┘
        │             │
        │             ▼
        │      ┌─────────────────────────────────────┐
        │      │ TIER 3: Generic cleanup skill        │
        │      │  - Apply baseline procedure          │
        │      │  - Higher uncertainty; CI / human    │
        │      │    review catches misses             │
        │      └─────────────────────────────────────┘
        ▼             │
   Apply skill ──────┴────► Open draft PR + comment on Jira

Why three tiers (not two)

A two-tier design (repo-specific OR generic) loses the operational signal about which repos would benefit from a dedicated skill. The middle tier — "flag this repo" — does two things:

  1. Per-repo signal. The list of repos hitting tier 2 is the prioritised authoring backlog for skill creation.
  2. Self-bootstrapping. The instructions for how to author a cleanup skill are surfaced at the moment of need — to the repo owner, on the work item that hit the gap. The owner sees "the agent could have done a better job here if you had a skill — here's how to write one" in the same UI they're reviewing the PR in.

This is structurally similar to error-as-affordance UX: the failure to find a skill is itself a CTA to create one.

Why repo-specific skills are necessary

"Atlassian hosts thousands of repositories owned by hundreds of teams, each with their own codebases and conventions so a 'one-size-fits-all' doesn't work."

A generic flag-cleanup skill can describe "find the conditional, inline the chosen branch, remove the flag config" — but per-codebase specifics matter:

  • Custom flag wrapper APIs (e.g. if (Flags.isEnabled('x')) vs if (FF.evaluate('x', user)) vs an annotation framework).
  • Test conventions (which tests assert flag behaviour, what cleanup is needed).
  • Lint / formatter configuration that affects the diff.
  • Breaking-change discipline (some repos require deprecation warnings before removal).
  • Flag-config service entry removal (some repos clean up config service alongside code; some don't).

The repo-specific skill encodes the per-codebase variability so the agent's diff matches the codebase's existing style — making human review faster and merge gate cleaner.

Skill substrate

The pattern assumes a substrate where skills can be authored, discovered, and loaded by the agent. Atlassian uses Rovo Dev skills (sibling to Claude Code skills, GitHub Copilot custom instructions). Wiki-canonical sibling discovery substrate: concepts/agent-skills-discovery (Cloudflare RFC for /.well-known/agent-skills/index.json) — a public-web analogue of "agent looks up where the skill is" on the open web.

Composes with

Reported outcome

"Now, every cleanup results consistently in a high quality PR and future instructions to continuously improve agent decision making."

Combined with the daily cron and status transition trigger, this fallback chain is a load-bearing component of the 500+ merged PRs in 70 days outcome on Atlassian's stale-flag-cleanup workflow.

Caveats

  • Tier 2 instruction quality matters a lot. If the "how to author a cleanup skill" prompt is unclear, owners will ignore it and the long tail of low-quality tier-3 generic-skill PRs grows. The post doesn't disclose the format / quality of the instructions.
  • Tier 1 skill staleness. As repos evolve, repo-specific skills go stale; without versioning / freshness alarms, the tier-1 skill might be worse than the tier-3 generic skill. No freshness contract is described.
  • Tier 2 is opt-in for the repo owner. If the owner never acts on the "create a skill" prompt, that repo stays on tier 3 forever. There's no escalation path described.
  • No instrumented per-tier success rate. "High quality PR" is asserted without per-tier breakdown — it's possible tier-1 skills produce 90% acceptance and tier-3 fallback produces 60%, and the aggregate is a useful average, but the tail matters operationally.
  • Fallback assumes a known generic procedure exists. For flag cleanup the generic procedure is well-defined (inline the chosen branch, remove flag plumbing). For some KTLO categories there may not be a meaningful tier-3 generic skill; the pattern doesn't generalise to "any KTLO category".

Adjacent patterns

  • Capability-based skill bundles — give the agent a few bundled skills to choose from based on declared capabilities. Same shape, different selector axis.
  • Per-team skill registry — owner-team has their own curated skill set, fallback to org-wide. Same shape, different granularity.

Seen in

Last updated · 542 distilled / 1,571 read