Skip to content

CONCEPT Cited by 1 source

Trailing-metric patience

Definition

Trailing-metric patience is the organisational discipline of committing to a reliability program whose top-line metric lags project delivery by months, without losing confidence or cutting investment when the metric has yet to move — while still gathering mid-stream sub-signals that confirm the program is at least functioning.

The discipline is load-bearing for any reliability program where the metric is derived from incident occurrence: incidents must happen or not happen before the metric moves, and their distribution is long-tailed and bursty.

Canonical disclosure

Slack's 2025-10-07 Deploy Safety retrospective canonicalises the discipline with verbatim (Source: sources/2025-10-07-slack-deploy-safety-reducing-customer-impact-from-change):

*"Patience with trailing metrics and faith that you have the right process even when some projects don't succeed

  • Using a measurement with multiple months of delay from work delivery will need patience.
  • Gather metrics to know if the improvement is at least functioning well (e.g., issue detection) whilst waiting for full results.
  • Faith that you've made the best decisions you can with the information you have at the time and the agility to change the path once results are confirmed."*

Concrete lag disclosure from the retrospective: Slack observed a 3-6 month lag from project delivery to full impact visibility on the program metric.

Why the lag exists

The lag is structural, not methodological:

  1. Incident-derived metrics depend on incidents happening. A program that reduces change-triggered-incident-hours only moves the metric when a change-triggered incident is averted or shortened — and incidents are bursty, not uniformly distributed. Several-quarter windows are required to distinguish signal from noise.
  2. Deploy of the fix comes before deploys that test it. A new automatic-rollback capability cannot prove itself until future deploys trigger regressions that it catches. The fix must be used, not merely available.
  3. Rollout is itself phased. The capability is often rolled out across services / teams / regions, with per-team adoption cost. Full coverage takes quarters.
  4. Behaviour change takes time. Engineer comfort with automatic rollback, manual-rollback tooling fluency, and team processes take sustained practice — see patterns/always-be-failing-over-drill.

Three required disciplines under patience

The retrospective names three disciplines the program must maintain when the metric lags:

  1. Acknowledge the lag explicitly. Publicly set the expectation that full results will take months. "Using a measurement with multiple months of delay from work delivery will need patience." If the exec-sponsor expects Q1-1 delta to appear in Q1-3, the program dies.
  2. Gather mid-stream sub-signals. "Gather metrics to know if the improvement is at least functioning well (e.g., issue detection) whilst waiting for full results." Canonical sub-signals: alert-fire count, false-positive rate, detection time, rollback-trigger count, manual-rollback usage count. These do not move the top-line metric but confirm the program is functioning.
  3. Hold faith with the agility to change course. "Faith that you've made the best decisions you can with the information you have at the time and the agility to change the path once results are confirmed." Faith is not stubbornness — the program's investment strategy (see patterns/invest-widely-then-double-down-on-impact) is premised on curtailing projects that under-perform once results arrive.

Program-management implications

  • OKR structure. Quarterly OKRs on the top-line metric are miscalibrated for trailing metrics. A better shape is sub-signal OKRs (per-project) + annual/biannual top-line review.
  • Executive review cadence. Slack disclosed "Executive reviews every 4-6 weeks to ensure continued alignment and to seek support where needed" — note the cadence is not top-line-metric-driven; it is alignment-driven.
  • Communication to engineering staff. Slack explicitly flags this as an area they're working to improve: "it hasn't always been clear to general engineering staff revealing an opportunity for better alignment." Management understood the patience-with-trailing-metrics discipline; individual engineers less so.

Structural tension with project-metric incentives

The three-layer chain (Customer sentiment <-> Program Metric <-> Project Metric; see concepts/customer-impact-hours-metric) is the structural source of this tension:

  • Engineers want project-metric feedback"did my work move the thing I can control?" — which resolves in weeks.
  • The program wants top-line-metric movement"did the customer experience improve?" — which resolves in quarters.

The verbatim framing: "especially difficult for engineers who prefer a more concrete feedback loop with hard data – i.e., 'How much does my work or concept change customer sentiment?'"

The discipline is to accept that "how much my work moves the top line" may never be cleanly decomposable — and keep executing while the top-line moves on its own cadence.

Anti-patterns the discipline guards against

  • Premature cancellation. Cutting investment because the metric hasn't moved by Q2 when the lag is Q3-Q4.
  • Metric-chasing whiplash. Swapping metrics every quarter because the current one isn't moving — which resets the lag clock.
  • Under-investing in mid-stream signals. A program with only a trailing metric is flying blind for its entire feedback-loop-length window.
  • Over-investing in per-project-metric-to-top-line attribution. The project-to-program attribution is fundamentally loose; formal attribution models (statistical regression, counterfactual analysis) rarely survive the noise in incident data.

Relationship to wiki primitives

  • concepts/customer-impact-hours-metric — the canonical Slack-shaped trailing metric that requires this discipline.
  • concepts/change-triggered-incident-rate — the denominator-side metric that shares the same lag shape.
  • concepts/feedback-control-loop-for-rollouts — the complementary leading-signal substrate; deploys are rollout-gated on fast feedback loops, while the program is evaluated on trailing metrics — the two operate at different time scales.
  • concepts/dora-metrics — DORA's four metrics have a similar trailing-metric character (change-fail-rate, MTTR) and the same patience requirement applies to any DORA-metric-driven reliability program.
  • concepts/observability — the substrate-quality prerequisite: you cannot produce useful mid-stream sub-signals without adequate observability instrumentation.

Seen in

Last updated · 470 distilled / 1,213 read