Skip to content

CONCEPT Cited by 1 source

Customer impact hours metric

Definition

A customer impact hours metric is a reliability program's top-line metric defined as summed hours of customer-visible impact from high-severity and filtered medium-severity incidents, scoped by cause (change-triggered / external / capacity / etc.).

The metric is deliberately an imperfect analog of customer sentiment — it's cheaper than direct customer-sentiment survey data while correlating well enough that project-level improvements move it.

Canonical disclosure

Slack's 2025-10-07 Deploy Safety retrospective canonicalises the choice with verbatim (Source: sources/2025-10-07-slack-deploy-safety-reducing-customer-impact-from-change):

"Hours of customer impact from high severity and selected medium severity change-triggered incidents."

"Selected" means filtered-by-post-hoc-impact-analysis — Slack severity levels convey impending or current impact, not final impact, so a human curation pass is required to distill medium-severity incidents down to the ones that mattered.

The three-layer chain

Slack articulates the metric's position in a three-layer chain:

Customer sentiment <-> Program Metric <-> Project Metric
  • Customer sentiment is the truth the program is trying to move. Direct measurement requires surveys, NPS, support-ticket sentiment — slow, noisy, expensive.
  • Program metric is the imperfect analog — defined well enough to be computed from existing incident data; close enough to customer sentiment that moving it correlates with moving sentiment.
  • Project metric is per-project — "did automatic rollback reduce MTTR on Webapp backend?" — measurable in the specific project's substrate.

Verbatim on the coupling (Source: sources/2025-10-07-slack-deploy-safety-reducing-customer-impact-from-change):

"They're all connected, but it's challenging to know for a specific project how much it is going to move the top line metric."

This is load-bearing on how projects are justified + evaluated — you cannot directly attribute a specific quarterly top-line move to a specific project. You measure the project metric during the project; you measure the program metric in aggregate after a 3-6 month lag; you accept a loose causal chain.

The four metric-design criteria

Slack names four explicit criteria:

  1. Measure results. Not effort, not output. Results.
  2. Understand what is measured (real vs analog). The metric is an analog; don't confuse it with the underlying truth (customer sentiment).
  3. Consistency in measurement, especially subjective portions. The "selected medium severity" filter is the subjective portion — apply consistently across quarters.
  4. Continually validate the measurement matches customer sentiment"with the leaders having the direct conversations with customers". The program metric's legitimacy is only as good as its ongoing validation against the sentiment it's trying to analog.

Operational trade-offs in metric design

  • Incident-count vs incident-hours. Count is easier to compute but doesn't distinguish a 2-min blip from a 4-hour outage. Hours weight by duration.
  • All-incidents vs severity-filtered. All-incidents over-counts noise from low-severity internal-only incidents; severity-filtered requires a curation discipline.
  • All-cause vs change-triggered. A program investing in deploy safety should track only the change-triggered subset (to avoid credit/blame for external-cause moves); a full reliability program tracks all-cause.
  • Trailing vs leading. Incident-derived metrics are trailing — you measure what has already happened. Leading indicators (canary metrics during deploy, alert volume, change-fail-rate) are available faster but weaker on customer- sentiment correlation.

See concepts/trailing-metric-patience for the patience discipline required when the metric trails delivery.

Why "imperfect analog" is the right framing

The metric's imperfection is feature, not bug:

  • Perfect would be unmeasurable. True customer sentiment cannot be measured continuously at scale.
  • Direct would be slow. NPS / survey cadence is quarterly; incident data is daily.
  • Pure proxy would decouple. A metric unconnected to customer impact (e.g., deploy-count, MTTR-over-all-incidents) optimises against the substrate without moving sentiment.

The metric sits at the intersection of "measurable from existing data, daily" and "moves when customer sentiment moves." The three-layer-chain framing names exactly why: the chain is the cost of measurability.

Caveats

  • Selection bias in "selected medium severity". The curation pass is a human judgement call; quarter-to-quarter reviewer change can make the metric non-comparable.
  • Severity-level drift. If the org's severity scale shifts (more sev-2s reclassified as sev-3s because of policy change), the metric moves without any underlying customer impact change.
  • Root-cause attribution. Change-triggered-only scope requires per-incident cause classification, which is post-hoc and sometimes contested.
  • The metric can be gamed. Over-filtering medium-severity incidents reduces the metric without improving customer sentiment. The "continually validate" criterion is the check.
  • No per-region / per-customer-segment decomposition in the canonical Slack disclosure. A customer segment with 100% breakage can be invisible if their traffic is small in aggregate.

Seen in

Last updated · 470 distilled / 1,213 read