CONCEPT Cited by 1 source
Customer impact hours metric¶
Definition¶
A customer impact hours metric is a reliability program's top-line metric defined as summed hours of customer-visible impact from high-severity and filtered medium-severity incidents, scoped by cause (change-triggered / external / capacity / etc.).
The metric is deliberately an imperfect analog of customer sentiment — it's cheaper than direct customer-sentiment survey data while correlating well enough that project-level improvements move it.
Canonical disclosure¶
Slack's 2025-10-07 Deploy Safety retrospective canonicalises the choice with verbatim (Source: sources/2025-10-07-slack-deploy-safety-reducing-customer-impact-from-change):
"Hours of customer impact from high severity and selected medium severity change-triggered incidents."
"Selected" means filtered-by-post-hoc-impact-analysis — Slack severity levels convey impending or current impact, not final impact, so a human curation pass is required to distill medium-severity incidents down to the ones that mattered.
The three-layer chain¶
Slack articulates the metric's position in a three-layer chain:
- Customer sentiment is the truth the program is trying to move. Direct measurement requires surveys, NPS, support-ticket sentiment — slow, noisy, expensive.
- Program metric is the imperfect analog — defined well enough to be computed from existing incident data; close enough to customer sentiment that moving it correlates with moving sentiment.
- Project metric is per-project — "did automatic rollback reduce MTTR on Webapp backend?" — measurable in the specific project's substrate.
Verbatim on the coupling (Source: sources/2025-10-07-slack-deploy-safety-reducing-customer-impact-from-change):
"They're all connected, but it's challenging to know for a specific project how much it is going to move the top line metric."
This is load-bearing on how projects are justified + evaluated — you cannot directly attribute a specific quarterly top-line move to a specific project. You measure the project metric during the project; you measure the program metric in aggregate after a 3-6 month lag; you accept a loose causal chain.
The four metric-design criteria¶
Slack names four explicit criteria:
- Measure results. Not effort, not output. Results.
- Understand what is measured (real vs analog). The metric is an analog; don't confuse it with the underlying truth (customer sentiment).
- Consistency in measurement, especially subjective portions. The "selected medium severity" filter is the subjective portion — apply consistently across quarters.
- Continually validate the measurement matches customer sentiment — "with the leaders having the direct conversations with customers". The program metric's legitimacy is only as good as its ongoing validation against the sentiment it's trying to analog.
Operational trade-offs in metric design¶
- Incident-count vs incident-hours. Count is easier to compute but doesn't distinguish a 2-min blip from a 4-hour outage. Hours weight by duration.
- All-incidents vs severity-filtered. All-incidents over-counts noise from low-severity internal-only incidents; severity-filtered requires a curation discipline.
- All-cause vs change-triggered. A program investing in deploy safety should track only the change-triggered subset (to avoid credit/blame for external-cause moves); a full reliability program tracks all-cause.
- Trailing vs leading. Incident-derived metrics are trailing — you measure what has already happened. Leading indicators (canary metrics during deploy, alert volume, change-fail-rate) are available faster but weaker on customer- sentiment correlation.
See concepts/trailing-metric-patience for the patience discipline required when the metric trails delivery.
Why "imperfect analog" is the right framing¶
The metric's imperfection is feature, not bug:
- Perfect would be unmeasurable. True customer sentiment cannot be measured continuously at scale.
- Direct would be slow. NPS / survey cadence is quarterly; incident data is daily.
- Pure proxy would decouple. A metric unconnected to customer impact (e.g., deploy-count, MTTR-over-all-incidents) optimises against the substrate without moving sentiment.
The metric sits at the intersection of "measurable from existing data, daily" and "moves when customer sentiment moves." The three-layer-chain framing names exactly why: the chain is the cost of measurability.
Caveats¶
- Selection bias in "selected medium severity". The curation pass is a human judgement call; quarter-to-quarter reviewer change can make the metric non-comparable.
- Severity-level drift. If the org's severity scale shifts (more sev-2s reclassified as sev-3s because of policy change), the metric moves without any underlying customer impact change.
- Root-cause attribution. Change-triggered-only scope requires per-incident cause classification, which is post-hoc and sometimes contested.
- The metric can be gamed. Over-filtering medium-severity incidents reduces the metric without improving customer sentiment. The "continually validate" criterion is the check.
- No per-region / per-customer-segment decomposition in the canonical Slack disclosure. A customer segment with 100% breakage can be invisible if their traffic is small in aggregate.
Seen in¶
- sources/2025-10-07-slack-deploy-safety-reducing-customer-impact-from-change — Slack's canonical program-metric choice; the three-layer chain framing; the "selected medium severity" filter; the four design criteria.