PATTERN Cited by 1 source
Sticky canary session affinity¶
Use session-affinity routing during canary experiments to ensure each user stays consistently in either the baseline or canary group for the experiment's duration — preventing cross-contamination and enabling clean metric comparison.
The pattern¶
- When a user's first request hits the experiment, assign them to baseline or canary based on a deterministic key (user ID, session token)
- All subsequent requests from that user route to the same group for the experiment window
- Metric comparison between groups is meaningful because each user saw a consistent experience
- On experiment completion, sticky routing is released
Why stickiness matters for data canaries¶
Without session affinity, a user might:
- See the new (potentially corrupt) data on one request
- See the baseline (correct) data on the next request
- The alternation makes it impossible to attribute behavioral changes (e.g., "did the user stop watching because of bad data, or just navigated away?")
Netflix's framing:
"Sticky canaries use session affinity to guarantee that once a user's traffic is routed to the baseline or canary clusters, it stays there for the duration of the experiment window. This prevents cross-contamination from concurrent chaos experiments, ensuring a clean apples-to-apples comparison between data versions."
Relationship to A/B testing¶
Sticky canary session affinity is structurally identical to A/B test assignment — users are partitioned into treatment groups and held there. The difference is in the decision: A/B tests compare product outcomes; sticky canaries compare system health and make a binary pass/fail decision that gates data publishing.
Trade-offs¶
- ✅ Clean apples-to-apples comparison
- ✅ Prevents concurrent-experiment interference
- ❌ Users in the canary group experience the (potentially bad) data for the full experiment window
- ❌ Small blast radius is critical — only ~0.2% of traffic should be routed through validation