SYSTEM Cited by 2 sources
Zalando SLO Reporting Tool¶
SLO Reporting Tool is Zalando's internal tool for tracking Service Level Objectives using canonical SLIs (a shared catalogue of Service Level Indicators) and the Service Tier classification to organise reporting, alert aggressiveness, and error-budget prioritisation.
Origin¶
Built by the Zalando Digital Experience SRE team in 2018 as part of the Service Tier rollout effort. Explicitly scoped to the DX department — "Services in other departments were not included in this effort and there was no mandate for them to adopt the new Service Tier definitions. Attempting to roll this out for the entire company (>4000 services) would not be feasible." (sources/2021-09-20-zalando-tracing-sres-journey-part-ii)
What it does¶
"Service Tier definitions were published. To help with the Service Tiers, a new SLO reporting tool was developed. The new tool defined canonical SLIs and used the tier classification."
Three responsibilities:
- Canonical SLI catalogue — a standardised, opinionated list of what availability / latency / correctness mean across the DX services. Avoids the "every team invents its own SLI" fragmentation that plagued pre-2018 SLOs at Zalando.
- Tier-aware reporting — services are classified by Service Tier; the tool's reports, dashboards, and alerts key off the tier attribute.
- Portfolio view — aggregates SLO compliance across services so leadership can see which tier, which team has reliability debt vs budget surplus.
Why it mattered¶
Before the SLO Reporting Tool, SLOs existed at Zalando since 2016 but weren't used for prioritisation:
"Despite the growing number of SLOs, they were still not used to help the teams strike a balance between feature development and operational improvements."
Two gaps the tool closed:
- Structure the portfolio. Without tiering, 100 teams' SLO dashboards are a flat wall; with tiering, reviewers can focus on Tier-1 reliability first.
- Compare like-for-like. Canonical SLIs let Tier-1 checkout be compared to Tier-1 payments on the same axis; bespoke per-team SLIs made cross-service comparisons impossible.
What followed: the pivot to Operation-Based SLOs¶
In 2019 Zalando pivoted from service-based SLOs (what this tool tracked) to Operation- Based SLOs. The pivot happened because:
- Compound user-journey availability doesn't equal any single service's SLO.
- CBO-level measurement became feasible once Distributed Tracing was fleet-wide.
- Adaptive Paging provided the routing layer that makes operation-level alerts actionable.
The SLO Reporting Tool's Tier-classification still applies at the operation level — Tier-1 operations get tight SLOs, Tier-3 operations get looser ones — but the unit of SLO shifted from service to operation.
The successor tool — the Service Level Management Tool — was built to operate at the CBO altitude and drives Adaptive Paging via error-budget burn-rate thresholds. The two tools coexisted during rollout — even teams that adopted CBOs didn't immediately disable their cause-based alerts, so both systems ran in parallel until the Operation-Based SLO framework proved itself numerically (the SRE-department dogfood in sources/2022-04-27-zalando-operation-based-slos — 56%→0% false-positive rate, 2→0.14 alerts/day, 30+ cause-based alerts retired).
Scope caveats¶
- DX department only — never rolled out company-wide. The >4,000-service scale made a mandatory company-wide rollout infeasible for the 7-person SRE team.
- Service-level granularity — predates and conflicts with the 2019 operation-based pivot.
- No public documentation of the canonical SLI catalogue contents — the blog post names the tool but doesn't enumerate the SLIs.
Seen in¶
- sources/2021-09-20-zalando-tracing-sres-journey-part-ii — names the SLO Reporting Tool built for the 2018 Service Tier rollout; DX-scoped; superseded in mindset by operation-based SLOs in 2019.
- sources/2022-04-27-zalando-operation-based-slos — canonicalises the tooling-layer supersession: a new Service Level Management Tool built to serve Operation-Based SLOs replaces this tool as Zalando's primary SLO platform, though the two coexisted during the rollout period.
Related¶
- concepts/service-tier-classification — the classification scheme this tool consumes.
- concepts/operation-based-slo — the 2019 evolution that shifted SLO granularity from service to operation.
- concepts/critical-business-operation
- systems/zalando-service-level-management-tool — the operation-based successor.
- companies/zalando