Skip to content

PATTERN Cited by 1 source

SRE team per Product Cluster

SRE team per Product Cluster is the organisational shape that positions one SRE team at the granularity of a Product Cluster (a grouping of 5–20 delivery teams working on a related product domain) — rather than one central SRE team for the whole company, or one embedded SRE per delivery team.

The three alternatives

Zalando's 2016 retrospective names the three options it debated:

  1. One central SRE team — rejected. Zalando was already 1,000+ engineers; no central team could cover the surface. Fails at scale for any org past a few hundred engineers.
  2. One SRE per delivery team (embed) — rejected. "The scope would be too large for the lone SREs. Not to mention that, over time, they'd likely become the Ops engineer for the team they were in." (Source: sources/2021-09-12-zalando-tracing-sres-journey-in-zalando-part-i). Fails because lone SREs regress to the mean of their host team.
  3. One SRE team per Product Cluster — chosen. Gives SREs end-to-end responsibility over a domain without too-wide scope.

Why the middle ground

The Product Cluster granularity is the point at which:

  • SREs have enough context to be effective (one domain, not the whole company).
  • SREs are numerous enough (a team, not a lone embed) to avoid absorption into a single delivery team's ops work.
  • The number of SRE teams scales with product growth rather than company headcount.
  • Cross-cluster patterns emerge that a dedicated SRE department can later own (see Phase 3 of concepts/sre-organizational-evolution).

Reporting chain

The Google SRE workbook's guidance that reliability work is a specialised role pairs with this pattern — each Product Cluster SRE team reports into the SRE chain, not into product delivery, to avoid the lone-SRE regression-to-Ops failure mode.

When to use

  • Org is large enough that a central SRE team cannot cover the surface (≥ a few hundred engineers).
  • Product domains are cohesive enough that "one team per cluster" is a meaningful granularity.
  • There's organisational appetite to staff multiple SRE teams (not a single Head of SRE + contractors).

Seen in

Last updated · 476 distilled / 1,218 read