PATTERN Cited by 2 sources

Grassroots SRE rollout¶

Grassroots SRE rollout is the bottom-up pattern for introducing Site Reliability Engineering in an organisation: a small coalition of SRE-interested engineers pitches the discipline to management, wins charter, and drives initial adoption — without a top-down mandate, pre-existing SRE department, or outside hire.

Shape¶

Coalition forms — a handful of engineers who have read the Google SRE book (or equivalent) recognise the org's pain matches the book's problem statements.
Pitch management — present the pain points (on-call overload, inconsistent reliability, no SLOs) and SRE as a solution. Emphasise concrete wins (e.g. standardised on-call tooling) over ideology.
Structural debate — decide the SRE team shape. Options typically include central team, per-team embed, or per-product- cluster team (Zalando's choice).
Baseline primitives rollout — SLOs, SLIs, an SLO reporting tool, and workshops on reliability patterns (retries, circuit breakers, fallbacks) applied to critical services first.
Forcing function — pair the rollout with a visible deadline (peak event, annual planning cycle) to drive adoption urgency. Cyber Week prep in the Zalando case.

When it works¶

Culture is open to change, so the coalition can pitch upward without friction. Zalando explicitly names this as the enabler: "Zalando is a company that does not shy away from change. It's a core part of the company's DNA." (Source: sources/2021-09-12-zalando-tracing-sres-journey-in-zalando-part-i)
Pain is acute enough that engineering managers will sponsor upward without mandating.
The coalition has credibility — engineers with incident scar tissue, not newcomers.

When it fails¶

Product management stays uninvolved. SLOs get defined by engineers and ignored by PMs. Service-level targets don't influence roadmap decisions, so the discipline doesn't change behaviour. Zalando's 2016 attempt failed on exactly this axis.
No owner for cross-cutting primitives. Without a chartered team, observability infrastructure, SLR tooling, and training materials live in nobody's day job and decay.
Coalition burnout. Volunteers run the program alongside their regular work. Attrition from the coalition rolls back the rollout.

Known failure case¶

Zalando 2016 → 2017 — coalition formed, management bought in, SRE structure debated, SLOs rolled out, Reliability Workshops held. Attempt stalled because SLOs never became a PM primitive and senior management preferred team-owned on-call. Resolution was a pivot to concepts/you-build-it-you-run-it; SRE re-emerged differently in Parts II & III of the retrospective. (Source: sources/2021-09-12-zalando-tracing-sres-journey-in-zalando-part-i)

Known success case¶

Zalando 2017 → 2020 (second attempt, retrospected in the 2020 Cyber Week post) — after the 2016 attempt stalled and the ownership model shifted, grassroots production-readiness reviews and OpenTracing rollout seeded Phase 2 of the three-phase evolution. The pattern can fail once and still seed a later success. (Source: sources/2020-10-07-zalando-how-zalando-prepares-for-cyber-week)

Seen in¶

sources/2021-09-12-zalando-tracing-sres-journey-in-zalando-part-i — canonical failed-first-attempt.
sources/2020-10-07-zalando-how-zalando-prepares-for-cyber-week — canonical successful re-attempt.