PATTERN Cited by 1 source
Unified SRE team over federated¶
Unified SRE team over federated names the structural choice to consolidate multiple SRE teams from different departments / reporting chains into a single team under one reporting chain, once the costs of federation (inconsistent guidance, organizational confusion, priority drift) exceed the benefits (departmental context, local presence). It is a Phase-2 → Phase-3 transition marker.
When to apply this pattern¶
- Multiple SRE teams currently in different departments with overlapping charters (observability, incident response, PRR, reliability enablement).
- Teams asked the same questions by other teams and arriving at different answers because local context diverges.
- Federated teams recently ran a shared SRE Program for one to three annual cycles and the program is now diverging rather than converging.
- Cross-cutting reliability primitives (distributed tracing, adaptive paging, SLO reporting, shared alert conventions) need a single owner with long-term charter.
- Team size is still small enough (sub-15 engineers) that a single team can cover the combined scope without creating its own coordination problem.
Don't apply when¶
- Teams' charters are genuinely orthogonal. If one team runs DB reliability and the other runs frontend RUM, the motivation for merging is weaker.
- The departmental-presence benefit outweighs consistency costs. Heavy-duty platforms (e.g. payments) may need an embedded team with deep product context that a company-wide team cannot provide.
- Merger cost exceeds friction cost. Cross-department mergers are politically expensive; federation + a stronger Program can sometimes be cheaper.
- The merged team would exceed span-of-control limits. Once the unified team needs ≥2 reporting lines internally, the "unified" benefit partially reverts.
Canonical example — Zalando 2019¶
"In early 2019 both SRE teams were officially united into a single team in the DF department (the department of one of the original teams). With this merger, SRE now had a single voice in the company." — sources/2021-09-20-zalando-tracing-sres-journey-part-ii
The drivers:
- Inconsistent guidance to internal customers. Teams asking two SRE teams the same question got different answers. "Teams in Zalando would seek out guidance from SRE, not knowing which team to reach out to, or even that there were 2 separate teams."
- Different reporting chains, different priorities. DX SRE's leadership had different priorities than DF SRE's leadership; when chains diverged, the Program stalled.
- Post-program drift. 2018 SRE Program ended after Cyber Week 2018; in 2019 each team would otherwise have reverted to its department's backlog, breaking the shared roadmap.
The outcome:
- Merged into DF department, 7 SREs total.
- One voice on reliability, observability, and incident response.
- The team starts owning its own product roadmap (not just responding to requests).
- Transition from "consulting + kitchen sink" in 2018 to "own our own products + still carry significant consulting load" in 2019.
Implementation¶
- Choose the landing department carefully. Zalando chose DF because its charter was already company-wide; DX's was department-specific. The receiving department's charter should match the merged team's ambition.
- Migrate the program artifacts, not just the org chart — the shared backlog, the operational processes, the cross-cutting documentation all move.
- Replace the Program's charter with the unified team's charter. The unified team needs an explicit long-term mandate (not time-bounded) from leadership; without it the merger becomes just a bigger version of the Program that ends after the next Cyber Week.
- Retain department liaisons without embedding. Zalando's unified 2019 team remained in-demand from departments; careful battle-picking (3-dim prioritisation: likelihood of success, company priorities, enablement value) continues as the governance pattern.
Trade-offs¶
| Dimension | Federated + Program | Unified team |
|---|---|---|
| Guidance consistency | Risk of divergence | One source of truth |
| Department context | Deep | Shallower, from liaisons |
| Roadmap ownership | Program-bounded | Unified long-term |
| Political cost | Low (lightweight) | Higher (re-org) |
| Scale ceiling | Low (diverges after 2+ years) | Medium (caps around ~15 ppl) |
| Investment in cross-cutting primitives | Hard to sustain past program end | Natural home |
Failure modes of the unified team¶
- "Ivory tower SRE" — the unified team becomes prescriptive without context, and product teams stop listening. Mitigated by continuing PRRs, embedded reviewers on critical launches, guild-based knowledge exchange.
- Demand exceeds capacity. Unified means every team now knows exactly who to ask; capacity becomes the bottleneck. Zalando explicitly kept hiring rigorous and declined to compromise on the bar even under demand pressure: "it was agreed that we would not compromise our hiring."
- Platform ownership ambiguity. If the merger doesn't also clarify who owns the cross-cutting primitives (tracing infra, SLO tool, paging platform), the unified team may end up indirectly owning all of them without the resourcing.
Seen in¶
- sources/2021-09-20-zalando-tracing-sres-journey-part-ii — 2019 Zalando SRE team merger; canonical narrated example of the Phase-2 → Phase-3 transition.
Related¶
- concepts/sre-program — the structure this pattern supersedes when federation stops working.
- concepts/sre-organizational-evolution — the broader phase model.
- concepts/production-readiness-review
- companies/zalando