SLACK 2025-01-07 Tier 2

Slack — Automated Accessibility Testing at Slack¶

Summary¶

Slack Frontend Test Frameworks team retrospective on a 2022-initiated project to add automated accessibility checks to Slack's desktop test infrastructure using Axe Core integrated via the @axe-core/playwright binding into Slack's pre-existing Playwright E2E framework. The post documents two failed integration attempts — baking Axe into React Testing Library's render (blocked by Slack's customised Jest setup) and baking Axe into Playwright's Locator interaction methods (blocked by Locator auto-wait semantics that guarantee individual element readiness but not whole page readiness) — before landing on a Playwright-fixture-extension integration where an a11y.runAxeAndSaveViolations() helper hangs off Slack's existing slack custom fixture and is invoked explicitly by test authors after each page/view has fully rendered. Canonicalises four reusable architectural primitives beyond the specific Axe+Playwright binding: fixture-extension as the integration surface for cross-cutting testing concerns, exclusion-list-backed known-issue filter (selectors for ticketed known bugs + selectors for rules that don't apply to Slack's design conventions), severity-gated violation reporting (report only critical impact-level WCAG violations initially, defer serious / moderate / mild as explicit future work — canonical anti-alert-fatigue lever), and tri-mode opt-in test execution via environment flag (A11Y_ENABLE: default-off, opt-in locally / scheduled nightly in Buildkite / optional CI gate for significant changes). Concrete production numbers disclosed: 91 tests in the initial accessibility test suite (mirroring Slack's critical-functionality suite), configured as non-blocking (violations surface but don't block merges). Violations flow into a Slack alert channel that wraps a Jira workflow for auto-ticket creation with a pre-populated automated accessibility label under a Jira Epic for triage by Slack's internal accessibility team. Audit tooling — a screenshot-every-Axe-call script behind an audit environment flag — detects call-site duplication.

This is a borderline Tier-2 ingest, same disposition as the 2024-06-19 Slack Enzyme→RTL codemod ingest — developer-productivity / test-framework- integration rather than distributed-systems internals, but the post canonicalises reusable architectural primitives (fixture-extension for cross-cutting concerns, severity-gated reporting, tri-mode opt-in execution, alert-to-auto-ticket triage) that generalise to any automated-check integration into a live test suite.

Key takeaways¶

Baking Axe into the test-framework's interaction primitives was investigated and rejected in two different frameworks. Slack first tried wrapping RTL's render with a custom function that also ran Axe — blocked by Slack's customised Jest setup ("Running accessibility checks through a separate Jest configuration worked, but would require developers to write tests specifically for accessibility, which we wanted to avoid. Reworking our custom Jest setup was deemed too tricky and not worth the time and resource investment"). Slack then tried baking Axe into Playwright's Locator interaction methods (clicks, navigation) — blocked because Playwright's Locator auto-wait only guarantees "the readiness of individual elements, not the overall page", so Axe would run against partially-rendered pages. A waitForElement-like deprecated control alternative would "caus[e] performance degradation, potential duplication of errors, and conflicts with the abstraction model that Playwright follows." Canonical failed-abstraction pattern: the framework primitive that makes normal tests stable is the same primitive that makes whole-page validation unreliable. (Source: sources/2025-01-07-slack-automated-accessibility-testing-at-slack)
Landing architecture: Axe as a method on the existing custom Playwright fixture, invoked explicitly after page-ready. Slack's Playwright framework already exposes a custom slack fixture ("provides access to all of our API calls, UI views, workflows and utilities related to Slack ... without having to go through the setup process every time"). The accessibility helper was mounted as slack.utils.a11y.runAxeAndSaveViolations() — test authors add a single call after each new view/page renders. This is the canonical fixture-extension- as-integration-surface pattern: Playwright's fixture model is the load-bearing extension point for adding cross-cutting testing concerns without touching individual tests or the framework core.
Axe scope is WCAG 2.1 Levels A and AA via Axe's tag set. Slack configured AxeBuilder with defaultTags = ['wcag2a', 'wcag2aa', 'wcag21a', 'wcag21aa'] — canonicalised as the WCAG 2.1 A/AA scope-picker. Axe ships a set of tags (wcag2a, wcag2aa, wcag21a, wcag21aa, wcag22aa, best-practice, EN-301-549, etc.) and AxeBuilder.withTags([...]) is the well-typed way to constrain which rule-set the audit uses.
Exclusion list has two categorically distinct buckets. constants.ACCESSIBILITY.AXE_EXCLUDED_SELECTORS carries two kinds of entries: (a) "Known accessibility issues – issues that we are aware of and have already been ticketed" — selectors pinned to open Jira tickets, removed from the exclude list when the underlying bug is fixed; (b) "Rules that don't apply – Axe rules outside of the scope of how Slack is designed for accessibility" — selectors for design-convention elements where the Axe rule doesn't match Slack's design-system ground truth. Canonical exclusion- list-backed known-issue filter: one list, two axes — the known-issue axis has a fix-forward telemetry arc, the out-of-scope axis is structurally permanent until the design convention changes.
Severity-gated reporting: critical only at launch, defer rest as explicit future work. Axe reports four impact severities (critical, serious, moderate, minor). Slack's filter: violations.filter(v => ['critical'].includes(v.impact)) — canonicalised as severity- gated violation reporting / severity- filtered violation reporting as scope-throttle. Verbatim: "We chose to report only the violations deemed Critical according to the WCAG. Serious, Moderate, and Mild are other possible severity levels that we may add in the future." Canonical anti-alert-fatigue lever — at a new-system launch, the ceiling of automated-check signal volume matters as much as coverage, because too many violations at launch collapse into noise and get ignored.
Tri-mode opt-in execution via single environment flag. Slack gates accessibility checks behind A11Y_ENABLE, default false. Three execution modes compose from this single flag: on-demand local (developer flips flag manually on their branch), scheduled nightly (daily regression run configured in Buildkite, pipe results into a Slack alert channel), optional CI gate (flag can be enabled in CI pipelines "for thorough testing before merging significant changes"). Canonicalised as tri-mode opt-in test execution: opt-in default + three execution surfaces composed from one boolean flag. Contrast with an always-on model (noise + CI time cost on every PR) and contrast with a nightly-only model (no developer on-demand mode, no per-PR opt-in gate).
Violations-to-Jira auto-ticket workflow via Slack alert channel. The nightly Buildkite run pipes results into a Slack alert channel where a Slack workflow spins up a Jira ticket with a pre-populated template, "automatically labeled with automated accessibility and placed in a Jira Epic for triaging." Verbatim surface:

A11Y Automation Bug Ticket Creator Automatically create JIRA bug tickets for A11Y automation violations Hi there, Would you like to create a new JIRA defect? Button clicked. A new JIRA bug ticket, A11YAUTO-37, was created. Canonicalised as alert- channel-to-Jira auto-ticket workflow: violation detection output terminates in a pre-populated tracking ticket rather than in a free- form alert — the alert becomes the ticket-creation UI, reducing triage friction.
91 non-blocking tests at launch, mirroring the critical-flow suite. Slack's initial accessibility test suite "mirrored our suite for testing critical functionality at Slack" — 91 tests total, configured as non-blocking ("developers would see the test results, but a failure or violation would not prevent them from merging their code to production"). Canonical rollout-register: a new automated-check class ships alongside (not inside) the critical-functionality suite, non-blocking first, with an explicit future path to a small blocking suite (verbatim: "a small blocking test suite. These tests will be dedicated to the flows of core features within Slack, with a focus on keyboard navigation"). Mirrors the canonical deployment curve for any new signal-producing CI gate.
Call-site placement discipline + audit tooling for duplication detection. Axe-call placement is "directly after a button click ... or a link that leads to navigation ... after signing in as a second user ... after a redirect" — one check per new view/page reached. Verbatim constraint: "It was important to make sure the same view wasn't being analyzed twice in one test, or potentially twice across multiple tests with the same UI flow." Duplication costs: "unnecessary error messages and saved artifacts, and slow down our tests." Slack built an audit environment flag + script that screenshots every page where an Axe call fires, saving to a folder for manual comparison. Self-flagged as "more manual than we like" with explicit future work toward AI-assisted audit and AI-assisted check-placement ("potentially leaning on AI assistance to perform the audit for us – or have AI add our accessibility calls to each new page/view, thereby eliminating the need to perform any kind of audit at all").
Automated testing is explicitly bounded: not a replacement for manual testing or usability studies. Canonical automated- vs-manual testing complementarity framing for a11y specifically: "Automated tools can overlook nuanced accessibility issues that require human judgment, such as screen reader usability. Additionally, these tools can also flag issues that don't align with the product's specific design considerations." Slack's broader a11y strategy includes "involving people with disabilities early in the design process, conducting design and prototype review with these users, and performing manual testing across all of the assistive technologies we support." Automated testing is a layer — not the strategy.

Systems extracted¶

systems/axe-core — Deque Systems' accessibility testing engine; the rule-set and runtime that actually audits a DOM for WCAG violations. Minimizes false positives vs less curated alternatives.
systems/axe-core-playwright — @axe-core/playwright npm package binding Axe Core into Playwright. Provides AxeBuilder with .exclude(selector), .withTags([...]), and .analyze().
systems/playwright — Microsoft's cross-browser E2E testing framework, the landing integration surface for Slack's Axe checks (extended with axe-core-playwright + a11y helper via fixture).
systems/react-testing-library — considered as integration surface via render-wrapping; rejected due to Slack's customised Jest setup.
systems/jest — Slack's customised Jest setup was the blocker for the RTL-render-wrapping approach.
systems/buildkite — runs the daily a11y regression suite; pipes results into a Slack alert channel.
systems/jira — receives auto-created tickets from the Slack alert-channel workflow; labels them automated accessibility and slots them under a triage Epic.

Concepts extracted¶

concepts/wcag-2-1-a-aa-scope — the WCAG 2.1 Level A + AA rule set as the conventional starting scope for automated a11y checks; expressed in Axe via the wcag2a / wcag2aa / wcag21a / wcag21aa tag set.
concepts/automated-vs-manual-testing-complementarity — automated a11y checks explicitly bounded: they miss screen-reader UX judgment and can flag design-intentional departures; manual testing with users-with-disabilities is the complement, not the substitute.
concepts/playwright-locator-auto-wait — Playwright's Locator object guarantees individual-element readiness (auto-wait, load, interactable), not whole-page readiness. This is the load-bearing abstraction for test stability but the blocker for baking-whole- page-audits into Locator interaction methods.
concepts/severity-filtered-violation-reporting — report only the highest-severity bucket at a new-system launch; defer lower severities to explicit future work. Canonical anti-alert-fatigue lever.

Patterns extracted¶

patterns/a11y-checks-via-playwright-fixture-extension — Playwright's fixture model is the load-bearing extension point for adding cross-cutting testing concerns (a11y, perf, visual regression) to an existing E2E suite; extend the custom fixture with a cross-cutting utility namespace (fixture.utils.a11y.*) rather than refactoring individual tests or the framework core.
patterns/exclusion-list-for-known-issues-and-out-of-scope-rules — two-axis selector exclusion list: (a) known-issue-ticketed selectors (fix-forward telemetry arc) + (b) rules-don't-apply selectors (structural-permanent until design convention changes). Applied as pre-filter before reporting so automation signal stays high.
patterns/severity-gated-violation-reporting — filter violations by severity bucket before surfacing; launch with only the highest bucket (critical in Axe's taxonomy) and explicitly defer lower buckets as future work.
patterns/tri-mode-opt-in-test-execution — three execution modes (local on-demand / scheduled nightly / opt-in CI gate) composed from one default-off environment flag. Enables gradual rollout of a new automated-check class without forcing every PR to pay its cost.
patterns/alert-channel-to-jira-auto-ticket-workflow — violation detection output terminates in a pre-populated Jira ticket (with triage-ready label + Epic placement) rather than in a free-form alert. The alert becomes the ticket-creation UI; triage friction collapses.

Operational numbers¶

91 tests in the initial accessibility test suite, mirroring Slack's critical-flow test suite.
Non-blocking at launch (violations surface, don't gate merges).
Daily regression cadence via Buildkite.
Scope: WCAG 2.1 Level A + AA.
Severity: critical only at launch (Axe's critical impact level); serious / moderate / mild deferred.
Year of initial setup: 2022.

Caveats¶

No absolute coverage metric: 91 tests is the count but the post does not disclose what percentage of Slack's user flows this actually covers.
No hit-rate numbers: the post doesn't disclose how many distinct violations the suite has found, what fraction were known-vs-new, or what fraction of tickets filed through the A11YAUTO workflow reached fixed-in-production state.
No severity-distribution disclosure: the rationale for critical-only ("anti-alert-fatigue") is stated but the volume shape of serious / moderate / mild that would have been surfaced had they been enabled is not quantified.
No audit-tooling hit-rate: the screenshot-every-Axe-call audit script is described but the frequency of duplication it catches is not disclosed.
The AI-assisted audit + AI-assisted check-placement futures are explicit future work, not shipped. No disclosure of what was prototyped.
The planned blocking small suite focused on keyboard navigation is explicit future work at time of post (2025-01-07), not shipped.
No disclosure of how the exclusion list is maintained at scale — how entries are added, how known-issue entries are retired when tickets close, how drift is prevented between the exclusion list and the set of open accessibility-team Jira tickets.
No discussion of Axe rule version drift over time (Axe rules change across versions; a rule that flagged a selector at Axe v4.2 may be deprecated / renamed at v4.8, invalidating exclusion-list entries).

Source¶

Original: https://slack.engineering/automated-accessibility-testing-at-slack/
Raw markdown: raw/slack/2025-01-07-automated-accessibility-testing-at-slack-f1055fc2.md
HN discussion: news.ycombinator.com/item?id=42628934 (131 points)

companies/slack — Slack Engineering blog (Tier-2).
sources/2024-06-19-slack-ai-powered-conversion-from-enzyme-to-react-testing-library — companion Slack Frontend Test Frameworks ingest; same team, same pattern of canonicalising reusable architectural primitives for test-framework integration at scale. The Enzyme→RTL codemod uses the same Jest runner and same RTL that this post investigated and rejected as the a11y integration surface.
systems/axe-core-playwright · systems/axe-core · systems/playwright
patterns/a11y-checks-via-playwright-fixture-extension · patterns/exclusion-list-for-known-issues-and-out-of-scope-rules · patterns/severity-gated-violation-reporting · patterns/tri-mode-opt-in-test-execution · patterns/alert-channel-to-jira-auto-ticket-workflow
concepts/wcag-2-1-a-aa-scope · concepts/automated-vs-manual-testing-complementarity · concepts/playwright-locator-auto-wait · concepts/severity-filtered-violation-reporting