Skip to content

CONCEPT Cited by 1 source

Customer-facing SLA

Definition

A customer-facing SLA is a quantitative promise stated in metrics the customer directly perceives — not internal system metrics. For a shared SQL query service like Presto at Meta-scale, the two headline examples from sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale are:

  • Queueing time — how long a submitted query sits before a cluster starts executing it.
  • Query failure rate — what fraction of submitted queries fail.

Both are things a data analyst / ML engineer / product-team user notices; neither is a CPU, disk, or network metric.

Why it matters at scale

"Defining SLAs around important metrics like queueing time and query failure rate in a manner that tracks customer pain points becomes crucial as Presto is scaled up. When there is a large number of users, the lack of proper SLAs can greatly hinder efforts to mitigate production issues because of confusion in determining the impact of an incident." — Meta, on scaling Presto (sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale)

Three structural roles the customer-facing SLA plays:

  1. Shared definition of "bad." Without it, an oncall and a product-team user cannot even agree whether an incident is ongoing.
  2. Alert trigger. Monitoring fires when SLA metrics breach; this is the trigger for oncall analyzers and for automated remediation.
  3. Prioritization of engineering investment. Where the SLA is breached regularly, automation (canary pipelines, bad-host drain, gateway throttling) gets prioritized; where it is not, ad-hoc fixes suffice.

Distinguishing from system SLOs

  • System SLO: "coordinator p99 memory < 80%". Operator-facing.
  • Customer-facing SLA: "p95 queueing time < N seconds". User-facing.

The first is a leading indicator for the second but not a substitute. Meta's insight: tie the alerting and automation stack to the customer-facing numbers, and let system SLOs be diagnostic supporting evidence.

Seen in

Last updated · 319 distilled / 1,201 read