Skip to content

CONCEPT Cited by 1 source

"You build it, you run it"

"You build it, you run it" is the ownership model where the team that develops a service is also responsible for operating it — including on-call, incident response, and capacity planning. The phrase is attributed to Werner Vogels (Amazon CTO) in the 2006 ACM Queue interview where he described Amazon's engineering culture.

The model collapses the traditional Dev / Ops hand-off: no separate operations team runs what developers build, so reliability decisions (SLOs, alerting, observability, rollback) are the developer's problem by default.

Origin stories in the wiki

This wiki has two canonical origin stories for the ownership model:

  • Amazon (2006) — Werner Vogels's articulation, covered via companies/allthingsdistributed and the culture-of- ownership claims throughout the AWS corpus.
  • Zalando (2016) — explicitly names "you build it, you run it" as the consequence of the grassroots SRE rollout failing. The quote:

    "The way that was chosen to kickstart that capability building was by putting each delivery team on-call for the critical services they owned. This decision was fundamental to properly establish the 'you build it, you run it' mentality we still have today." (Source: sources/2021-09-12-zalando-tracing-sres-journey-in-zalando-part-i)

The Zalando data point is particularly interesting: the ownership model was not ideologically chosen. It emerged when the SRE rollout stalled and senior management needed a pragmatic alternative. This contradicts the usual framing of "you build it, you run it" as a values-first architectural choice.

Trade-offs

  • Pro: accountability is unambiguous. No hand-off, no finger-pointing, faster feedback from operation into design.
  • Pro: forces developers to care about observability, graceful degradation, runbooks — so the code is built for operation from day one.
  • Con: on-call burden on developers is real and must be managed (rotation size, alert quality, training).
  • Con: without SRE enablement (shared primitives, observability platform, best-practice diffusion), each team reinvents the same reliability wheel at different quality levels.

Seen in

Last updated · 476 distilled / 1,218 read