PATTERN Cited by 1 source
Dogfood as adoption proof¶
Dogfood as adoption proof is the organizational pattern of a platform / SRE team applying its own new framework to its own services for a bounded trial period, measuring the before/after numbers, and publishing those numbers as the final lever to convince sceptical downstream teams to adopt the framework. The trial is explicitly scoped, explicitly time-bounded, and designed to produce quantitative evidence — not just a qualitative "it works for us too."
Canonical Zalando instance¶
Zalando's SRE team's 2021 dogfood of their own Operation-Based SLO framework:
"Everything we described so far seems to make perfect sense. And as we explained it to several teams, no one seemed to make any argument against it. But still, we were not seeing the initiative gaining the momentum we expected. Even teams that did adopt CBOs, weren't disabling their cause based alerts. Something was missing. We needed the data to support our claims [...] That's what we set out to do, by dogfooding the process within the department." — sources/2022-04-27-zalando-operation-based-slos
Result: over 3 months, within the SRE department:
- False-positive rate: 56% → 0%
- Alert workload: 2 → 0.14 alerts/day (~93% reduction)
- Alerts disabled: 30+ cause-based alerts retired
- User-facing incidents missed during trial: 0
Published numbers became the adoption lever: once other departments could see the measured gain, objections collapsed.
When the pattern fires¶
- A framework is technically sound but failing to gain adoption. Teams have no counter-argument but also no reason to invest in the switch. The pattern's diagnostic: "no one seemed to make any argument against it, but we were not seeing the initiative gaining momentum."
- The platform team owns services the framework applies to. SRE teams typically own observability platforms, load-testers, build systems — all fair dogfood targets.
- Quantitative before/after is possible. Frameworks with countable outcomes (alert volume, MTTR, deploys-per-week) dogfood cleanly; abstract-benefit frameworks ("better architecture") don't.
Four preconditions¶
- A bounded time window (Zalando: 3 months / one quarter) — long enough for the effect to manifest, short enough for the team to commit. Open-ended dogfood is indistinguishable from "we use it too."
- A baseline measurement before the dogfood starts. The headline number is the delta, which requires a before. Zalando had the 56% false-positive rate pre-rollout, making the 0% endpoint meaningful.
- A weekly / cadenced review during the trial — Zalando ran a "weekly operational review meeting" to curate which cause-based alerts could be safely disabled. Without the cadence, the framework gets applied but the cause-based alerts aren't disabled → no delta → no measurement.
- Leadership backing to retire existing practice. Alerts don't disable themselves; senior-engineer authority is needed to say "this cause-based alert is redundant now, delete it." Pattern requires a unified-SRE-team structure or equivalent cross-team authority.
Why it works¶
- Numbers trump narratives. Engineering teams defer to measured outcomes; "we reduced false positives from 56% to 0%" is argumentatively unanswerable in a way that "this framework is better because user-experience-centred" isn't.
- Proves the adoption cost is low. The dogfood team, not the prospective adopter, absorbs the switching cost. If dogfood succeeds in 3 months with a 7-person SRE team, it's probably feasible for a larger product team in ≤6 months.
- Surfaces unreported frictions. The team will discover integration issues, documentation gaps, and UI pains that external adopters would hit. Fixing these pre-adoption dramatically reduces the adoption cost for downstream teams.
- Builds a reference customer. Downstream adopters can say "the SRE team runs on it" as internal credibility.
Anti-patterns¶
- Dogfood without measurement. Dogfood for its own sake is cheap "eating our own cooking" theatre. The deliverable is the measurement.
- Dogfood on a toy service. Dogfooding on a service nobody relies on proves nothing. Zalando dogfooded on production observability — on-call cared about the numbers.
- Open-ended dogfood. Runs forever, never publishes results. The pattern works because there's a publication at the end.
- Dogfood as primary rollout strategy. The pattern is a "framework is stuck" unblocker, not a default launch plan. If you can get adoption without dogfood (executive mandate, obvious benefit, large-scale pilot with a design partner), do that.
- Cherry-picked metrics. Publish the baseline the team set at the start; don't redefine the metric mid-trial. Dogfood credibility comes from the before/after being defined before.
Relationship to other adoption patterns¶
- vs. platform mandate. Mandate works when leadership has the authority and willingness to compel adoption. Dogfood works when they don't — or when mandate would be politically expensive.
- vs. design partner. A design partner is an external team adopting the framework early in exchange for influence on its design. Dogfood is internal and doesn't give the dogfood team design authority (they already have it).
- vs. incremental opt-in. Gradual opt-in with per-team adoption is often what dogfood unlocks; the pattern produces the evidence that opt-in is worth it.
- vs. red-team / chaos engineering. Chaos engineering is dogfood's specific form for resilience frameworks: "run your own failure injection against your own systems first before asking others to." Structurally identical.
Prerequisites the pattern shares with Zalando's instance¶
- A unified SRE team — federated SRE across departments cannot coordinate a multi-quarter dogfood with consistent measurement. The 2019 Zalando SRE merger is explicit prerequisite.
- Ownership of the tooling stack the framework is built on. Zalando's SRE owned Observability Platform → they could ship Adaptive Paging + Service Level Management Tool
- MWMBR configuration → they could dogfood.
- A forcing function for alert-pruning discipline. Weekly operational reviews + executive air cover for retiring cause-based alerts. Without this the dogfood measures the overlap of both alert systems, not the new one's noise reduction.
Seen in¶
- sources/2022-04-27-zalando-operation-based-slos — canonical instance. SRE department self-applies the Operation-Based SLO / Adaptive Paging / MWMBR stack for 3 months, publishes 56% → 0% false-positive rate, 2 → 0.14 alerts/day, 30+ alerts disabled, 0 user-facing incidents missed. Explicitly framed as "dogfooding the process within the department" to break adoption stall.
Related¶
- concepts/operation-based-slo — the framework dogfooded.
- concepts/symptom-based-alerting
- concepts/critical-business-operation
- concepts/error-budget
- concepts/alert-fatigue — what the measured numbers targeted.
- patterns/unified-sre-team-over-federated — the structural prerequisite.
- companies/zalando