Skip to content

PATTERN Cited by 1 source

Unified SRE team over federated

Unified SRE team over federated names the structural choice to consolidate multiple SRE teams from different departments / reporting chains into a single team under one reporting chain, once the costs of federation (inconsistent guidance, organizational confusion, priority drift) exceed the benefits (departmental context, local presence). It is a Phase-2 → Phase-3 transition marker.

When to apply this pattern

  • Multiple SRE teams currently in different departments with overlapping charters (observability, incident response, PRR, reliability enablement).
  • Teams asked the same questions by other teams and arriving at different answers because local context diverges.
  • Federated teams recently ran a shared SRE Program for one to three annual cycles and the program is now diverging rather than converging.
  • Cross-cutting reliability primitives (distributed tracing, adaptive paging, SLO reporting, shared alert conventions) need a single owner with long-term charter.
  • Team size is still small enough (sub-15 engineers) that a single team can cover the combined scope without creating its own coordination problem.

Don't apply when

  • Teams' charters are genuinely orthogonal. If one team runs DB reliability and the other runs frontend RUM, the motivation for merging is weaker.
  • The departmental-presence benefit outweighs consistency costs. Heavy-duty platforms (e.g. payments) may need an embedded team with deep product context that a company-wide team cannot provide.
  • Merger cost exceeds friction cost. Cross-department mergers are politically expensive; federation + a stronger Program can sometimes be cheaper.
  • The merged team would exceed span-of-control limits. Once the unified team needs ≥2 reporting lines internally, the "unified" benefit partially reverts.

Canonical example — Zalando 2019

"In early 2019 both SRE teams were officially united into a single team in the DF department (the department of one of the original teams). With this merger, SRE now had a single voice in the company."sources/2021-09-20-zalando-tracing-sres-journey-part-ii

The drivers:

  • Inconsistent guidance to internal customers. Teams asking two SRE teams the same question got different answers. "Teams in Zalando would seek out guidance from SRE, not knowing which team to reach out to, or even that there were 2 separate teams."
  • Different reporting chains, different priorities. DX SRE's leadership had different priorities than DF SRE's leadership; when chains diverged, the Program stalled.
  • Post-program drift. 2018 SRE Program ended after Cyber Week 2018; in 2019 each team would otherwise have reverted to its department's backlog, breaking the shared roadmap.

The outcome:

  • Merged into DF department, 7 SREs total.
  • One voice on reliability, observability, and incident response.
  • The team starts owning its own product roadmap (not just responding to requests).
  • Transition from "consulting + kitchen sink" in 2018 to "own our own products + still carry significant consulting load" in 2019.

Implementation

  • Choose the landing department carefully. Zalando chose DF because its charter was already company-wide; DX's was department-specific. The receiving department's charter should match the merged team's ambition.
  • Migrate the program artifacts, not just the org chart — the shared backlog, the operational processes, the cross-cutting documentation all move.
  • Replace the Program's charter with the unified team's charter. The unified team needs an explicit long-term mandate (not time-bounded) from leadership; without it the merger becomes just a bigger version of the Program that ends after the next Cyber Week.
  • Retain department liaisons without embedding. Zalando's unified 2019 team remained in-demand from departments; careful battle-picking (3-dim prioritisation: likelihood of success, company priorities, enablement value) continues as the governance pattern.

Trade-offs

Dimension Federated + Program Unified team
Guidance consistency Risk of divergence One source of truth
Department context Deep Shallower, from liaisons
Roadmap ownership Program-bounded Unified long-term
Political cost Low (lightweight) Higher (re-org)
Scale ceiling Low (diverges after 2+ years) Medium (caps around ~15 ppl)
Investment in cross-cutting primitives Hard to sustain past program end Natural home

Failure modes of the unified team

  • "Ivory tower SRE" — the unified team becomes prescriptive without context, and product teams stop listening. Mitigated by continuing PRRs, embedded reviewers on critical launches, guild-based knowledge exchange.
  • Demand exceeds capacity. Unified means every team now knows exactly who to ask; capacity becomes the bottleneck. Zalando explicitly kept hiring rigorous and declined to compromise on the bar even under demand pressure: "it was agreed that we would not compromise our hiring."
  • Platform ownership ambiguity. If the merger doesn't also clarify who owns the cross-cutting primitives (tracing infra, SLO tool, paging platform), the unified team may end up indirectly owning all of them without the resourcing.

Seen in

Last updated · 501 distilled / 1,218 read