Skip to content

CONCEPT Cited by 1 source

Service Level Objective (SLO)

A Service Level Objective (SLO) is a target reliability level for a service, expressed as a threshold over a Service Level Indicator (SLI) — e.g. "99.9% of requests return within 200ms over a rolling 28-day window". Popularised by the Google SRE book (2016), SLOs are the primary quantitative instrument SRE uses to make reliability a first-class engineering concern.

Why the primitive matters

Before SLOs, reliability is either (a) implicit and therefore not managed, or (b) expressed as uptime percentages in contracts that do not reflect user experience. SLOs make the target explicit, measurable, and aligned with user-visible behaviour — latency, error rate, freshness — rather than component liveness.

Common failure mode — engineer-defined, PM-ignored

Zalando's 2016 SRE retrospective names this directly:

"Many services did have SLOs defined and collected, but it still did not end up influencing the software development process. The vast majority of SLOs were defined through initiatives from Engineers. … Product Managers had a hard time establishing a link between the different SLOs and their own expectations for the products they are responsible for." (Source: )

The lesson: SLOs defined by engineers and unowned by product managers do not change product decisions. SLO adoption is socio-technical, not technical.

Seen in

  • — canonical "SLOs defined but ignored" case study.
  • sources/2022-04-27-zalando-operation-based-slos — names the canonical formula SLO = Symptom + Target and canonicalises operation- based SLOs + error-budget-burn- rate alerting via MWMBR.
  • SLO as the priority-class-assignment criterion. Zalando's Communication Platform explicitly frames the move to per-priority AIMD admission control as SLO-driven: "business stakeholders have requested to ensure that we process the communication which supports critical business operations within the given service level objectives (SLOs)." The three priority classes exist because some event types (order confirmations) are SLO-protected and others (marketing) are not. Canonical instance of the SLO directly shaping a production admission-control policy's parameter table — per-priority AIMD coefficients {P1: +15/×0.8, P2: +10/×0.6, P3: +5/×0.4} designed so that P1 stays flat through load episodes regardless of how much P3 has to shed.
Last updated · 542 distilled / 1,571 read