PATTERN Cited by 1 source
Weighted-sum strategy migration¶
Weighted-sum strategy migration: when gradually migrating between two algorithms that produce the same shape of numeric output (e.g. two load-balancing strategies producing endpoint weights), blend their outputs via a percentage feature flag rather than flag-gating which algorithm's output the client uses. Every client sees the same blended output at any instant, regardless of feature-gate bucketing.
Problem¶
The naïve rollout: flip a feature flag per client, bucket clients into "old-strategy" and "new-strategy" groups. At 30% rollout, 30% of clients route by the new weights and 70% by the old. Works fine for most A/B tests.
For load balancing, it doesn't: different clients would route the same request class to different backends, potentially creating routing inconsistency:
- A sticky-routing contract (session affinity, consistent hashing for shard correctness) can break across the bucketed cohorts.
- Cache-warming assumptions can get invalidated on the 70% side when the 30% side moves traffic.
- Metrics attribution (was this outage because of the new strategy? or because its 30% of traffic coincidentally hit something else?) becomes tangled.
Pattern¶
Have each strategy's control plane write its outputs into separate entries in a shared store:
routing-db/{service}/strategy-A/endpoint-1 → weight
routing-db/{service}/strategy-A/endpoint-2 → weight
...
routing-db/{service}/strategy-B/endpoint-1 → weight
routing-db/{service}/strategy-B/endpoint-2 → weight
Every client reads both sets on every update, plus a percentage α (a shared feature flag), and computes:
α = 0 → pure old; α = 1 → pure new. All clients see the same α and the same weights; there is no per-client bucketing.
The migration is driven by changing α over time:
- Start at α = 0 for weeks while the new strategy computes weights (so you can compare).
- Ramp α to 5%, 10%, 25%, 50%, 100% on the operator's schedule.
- Roll back instantly by setting α back to 0.
What this does and doesn't give you¶
Does: - Consistent routing across the fleet at every α. No two clients disagree about which backend sees what fraction of traffic. - Instant rollback. Single flag flip, no client redeploys. - Observability of "halfway" states. Each strategy's weights are separately visible in the routing DB, so you can see how each would route without actually committing. - Parallel correctness proving. Run both for weeks at α = 0, compare outputs offline, only then start ramping.
Doesn't: - Give you a true A/B in the "measure strategy A's behavior vs strategy B's behavior in isolation" sense — the blend is always mixed. That's the tradeoff for consistency. - Work when the strategies produce qualitatively different output shapes (e.g. one produces weights, another produces shard-routing decisions). Both must share a numeric surface amenable to linear combination. - Solve the underlying "is the new strategy safe at 100%?" question — it just gives a smooth rollout path.
When to use it¶
- Migrating between LB algorithms that both produce per-endpoint weights.
- Migrating between ranking / scoring models whose outputs compose linearly.
- Any strategy migration where routing consistency across clients matters more than isolation per cohort.
When not to use it¶
- A/B experiments where you want isolated measurement of each arm's effect. Use patterns/ab-test-rollout instead.
- Qualitatively different strategies that don't share a numeric output space.
- Single-decision-per-request migrations (e.g. which database to query) where blending makes no sense.
Related migration patterns¶
- patterns/ab-test-rollout — client-bucketed A/B for isolated measurement; complementary, not competing.
- patterns/dual-write-migration — for write-path migrations.
- patterns/staged-rollout — the overall framing this fits inside.
- patterns/achievable-target-first-migration — strategy for sequencing the rollout at an org level.
Seen in¶
- sources/2024-10-28-dropbox-robinhood-in-house-load-balancing — Robinhood's migration between round-robin and PID-based load balancing. Both strategies' LBS instances write their own endpoint weights into distinct routing-DB entries; clients blend with a percentage gate. Example from the post: endpoint A weighted 100 under PID and 200 under round-robin; at 30% PID feature-gate the client sees
100 × 0.3 + 200 × 0.7 = 170. Stated value: "every client sees the same weight assignment for endpoints while gradually migrating to the new load balancing strategy."