PATTERN Cited by 1 source
Treat data as code deployment¶
Apply the same deployment rigor to data changes that you apply to code changes: canary analysis, automated rollback, blast-radius control, and gated publishing — because data can break production without any code change.
The pattern¶
For every data deployment:
- Stage the new data version to a validation environment
- Canary it against production traffic with a behavioral-metric comparison
- Gate publishing on canary pass — automatically block on regression
- Roll back to the last known-good data version on failure
Motivation¶
Netflix's motivating incident:
"No code had been deployed. No configuration had changed. But, a manual mitigation action taken during a previous incident had inadvertently corrupted a data feed. … Our sophisticated code canary deployments had caught nothing."
The gap: organisations invest heavily in code deployment safety (CI/CD, canary analysis, progressive rollout, feature flags) but often treat data pipeline outputs as "just data" — published without equivalent gates.
When the pattern applies¶
- Data that changes frequently (every few minutes, not quarterly)
- Data that impacts customers directly when corrupt
- Data where emergent issues in the final transformed output are possible even when each input source validates independently
- Data whose corruption may not manifest as technical errors (HTTP 200 with wrong content)
Netflix's framing¶
"This effort wasn't just about building a validation system, it was about recognizing that data deployments deserve the same rigor as code deployments. Just because something isn't a binary doesn't mean it can't break production."
Self-assessment questions (from the post)¶
- What's your MTTD for data corruption?
- Can you validate with production traffic safely?
- How would you detect emergent issues in transformed data?
- What behavioral metric most closely indicates customer impact in your domain?