PATTERN Cited by 1 source
Tri-mode opt-in test execution¶
Context¶
A team is adding a new test class (accessibility, visual regression, slow integration tests, flaky-but-useful checks) to an existing CI pipeline. The team wants:
- Developers can run it on-demand on their branch when they need feedback.
- A scheduled regression run catches drift outside the PR-per-PR cycle.
- The check is optionally enabled on significant PRs where the risk of merging without it matters.
What they don't want:
- The cost (time, noise, flakiness) paid by every PR at launch.
- Two separate pipelines to maintain.
- Branching logic scattered across N CI configs.
The pattern¶
Gate the new test class behind a single default-off
environment flag (A11Y_ENABLE, VISUAL_TEST_ENABLE,
etc.). Compose three execution modes from the same flag:
| Mode | Activation | Trigger |
|---|---|---|
| On-demand local | Developer sets flag locally | Manual A11Y_ENABLE=true pnpm test |
| Scheduled nightly | Flag set in scheduled pipeline | Cron-triggered in CI |
| Opt-in CI gate | Flag set per-PR in CI config | Set by commit-message tag, label, or PR-template opt-in |
// Single toggle at the test-framework level:
const a11yEnabled = process.env.A11Y_ENABLE === 'true';
// In the fixture helper:
async runAxeAndSaveViolations() {
if (!a11yEnabled) return; // no-op when disabled
// ... actually run audit
}
In Buildkite (Slack's CI) the scheduled
mode is a daily pipeline that sets A11Y_ENABLE=true and pipes
the output into a Slack alert channel.
Why default-off with three modes¶
- Default-off respects baseline PR time. No PR pays the cost unless opted in.
- Local on-demand lets developers iterate without CI round- trips. Flakiness feedback is fastest locally.
- Scheduled regression catches drift from landed PRs — the set of accessibility bugs that enter the codebase between full sweeps.
- Opt-in CI gate is the hedge for high-risk PRs — where the team knows the blast radius justifies the test cost.
- Single flag keeps config simple — no N-environment matrix, no pipeline variant explosion.
Graduation path¶
A new test class typically graduates through this ladder:
- Local-only / on-demand — team builds and validates the checks, removes false positives.
- Nightly scheduled — add daily regression run; build operational confidence in the signal.
- Opt-in CI — high-risk PRs (release candidates, refactors of load-bearing code) enable the flag.
- Default-on non-blocking — every PR runs the checks, violations surface but don't block merges.
- Default-on blocking (scoped) — a small focused subset (e.g. Slack's planned "small blocking test suite ... dedicated to the flows of core features ... with a focus on keyboard navigation") blocks merges.
The flag lets the team progress through this ladder without structural changes — the same code, just different activation policy.
Composes with other rollout levers¶
- patterns/severity-gated-violation-reporting — reduces the output volume; tri-mode reduces the invocation surface. Orthogonal.
- patterns/exclusion-list-for-known-issues-and-out-of-scope-rules — reduces the audit input; tri-mode gates whether the audit runs at all. Orthogonal.
Failure modes¶
- Flag drift across configs. If the flag is set in some pipeline files but not others, coverage is unpredictable. Mitigation: grep CI for the flag name during reviews.
- Forgotten-to-enable. On-demand mode is self-service, but developers forget. Mitigation: editor plugin or local-dev alias that reminds them.
- Scheduled run drift. If the nightly run breaks and the team doesn't notice (because it's non-blocking), the scheduled signal decays silently. Mitigation: alert on nightly pipeline failure separate from violation surfacing.
Generalisation¶
Applies to any new test class being rolled out to an existing CI pipeline:
- Integration test suite that's slow but high-signal.
- Visual regression snapshots that are flaky until baselined.
- Load tests that are expensive to run per-PR.
- Security scans that take minutes per run.
- Mutation tests that are too slow for per-PR.
Canonical rule: one flag, three modes, default-off, graduation path from opt-in to default-on.
Seen in¶
- sources/2025-01-07-slack-automated-accessibility-testing-at-slack
— Slack's
A11Y_ENABLEflag gates Axe checks. "By default, we set the flag to false, preventing unnecessary runs." Three explicit modes: on-demand local, scheduled Buildkite nightly (pipes into Slack alert channel), optional CI gate for significant changes.
Related¶
- systems/buildkite — the CI substrate hosting Slack's scheduled mode.
- patterns/severity-gated-violation-reporting — composes.
- patterns/exclusion-list-for-known-issues-and-out-of-scope-rules — composes.
- patterns/a11y-checks-via-playwright-fixture-extension — the code-level integration surface the flag gates.