Skip to content

CONCEPT Cited by 1 source

Time to know vs. time to recover

Definition

Time to know (TTK) and time to recover (TTR) are two adjacent but distinct incident-response KPIs. They partition the operational lifecycle of an incident:

  • TTK — from detection (paged / alerted) to understanding (a probable cause + blast radius is known to the responder).
  • TTR — from detection to mitigation complete (service back within SLO or fully recovered).

The adjacent industry terms are MTTD (mean time to detect — upstream of TTK), MTTK / MTTI (mean time to know / identify), and MTTR (mean time to recover). The Expedia STAR post uses TTK / TTR verbatim as the load-bearing optimisation target:

"Our objective with this service was to minimize the time to know (TTK) and time to recover (TTR). By enabling rapid analysis of observability data and evaluation of hypotheses, this service proved to be a valuable time-saving tool." (Source: sources/2026-04-28-expedia-expedias-service-telemetry-analyzer)

Why the split matters

TTK and TTR are compressible by different interventions:

KPI Compressed by Canonical tool
TTK Faster / richer diagnosis Observability (concepts/observability), automated RCA (concepts/automated-root-cause-analysis), STAR-style analyzers
TTR Faster action once diagnosed Pre-approved runbooks (concepts/incident-playbook), auto-remediation, feature-flag rollback, deploy-reverse

A system that only compresses TTR (e.g. one-click rollback) doesn't help if responders don't know what to roll back. A system that only compresses TTK (e.g. better dashboards) doesn't help if the diagnosis reveals a change outside the responder's mitigation toolkit.

STAR optimises the TTK side — it is a diagnosis tool, not a remediation tool.

Seen in

  • Expedia STAR (2026-04-28) — canonical wiki instance. STAR names TTK + TTR explicitly as its headline KPIs for the incident-investigation use case. Positions automated telemetry analysis + LLM-assisted hypothesis evaluation as a TTK lever. No numbers published — the post is a design retrospective, not a measurement study.
Last updated · 433 distilled / 1,256 read