Skip to content

PATTERN Cited by 1 source

Fork retirement via A/B test

Intent

Retire a long-divergent internal fork of an OSS library by running legacy + upstream side-by-side in the same binary via a dual-stack shim, then flipping each consumer (app / feature / use case) from legacy to upstream via an A/B experiment. Once every consumer has been migrated successfully, delete the legacy code. The shim itself stays in production as the ongoing-upgrade substrate.

Motivation

A long-divergent internal fork is a liability (see concepts/internal-fork-divergence). But a one-shot upgrade to upstream across all consumers simultaneously is unsafe when:

  • Consumers are heterogeneous (50+ RTC use cases across Messenger, Instagram, Cloud Gaming, Meta Quest).
  • User populations are diverse (billions of users, long-tail devices, many network conditions).
  • Rollback is expensive (full binary ship cycles).
  • Regressions in critical code paths (RTC in this case) are user-visible and must be caught cohort-by-cohort.

The resolution: migrate per-consumer, each with its own A/B experiment, each with its own go/no-go verdict. The dual-stack shim is the substrate that makes this possible — it lets the same binary serve both the legacy and the upstream path, gated by a runtime flag that the experiment framework controls.

Structure

  1. Build the dual-stack shim — see patterns/shim-for-dual-stack-ab-testing. Both legacy and upstream versions of the library are statically linked into the app binary; per-call dispatch is chosen by a global flavor enum or per-experiment flag.
  2. Instrument per-consumer experiments. Each app / feature / use case gets its own A/B experiment with metrics that matter for it (call-join time, CPU, crash rate, bitrate, user-visible quality). Experiment framework flips the shim's flavor flag per user cohort.
  3. Roll out upstream per-consumer, mitigating regressions. Work app-by-app. When a cohort shows a regression, investigate, fix (either backport the legacy behavior into upstream via a feature branch, or fix the upstream code with an upstream contribution), rerun the experiment. Ship only when the cohort is clean.
  4. Delete the legacy code paths in each shipped consumer. Once an app is fully on upstream, the legacy flavor can be excluded from that app's build. Binary shrinks by the legacy flavor's contribution.
  5. Leave the shim + upstream flavor in production. The shim now proxies a single flavor (upstream) per app, but the infrastructure remains so the next upstream release can be A/B'd the same way. "Today, the shim approach is used in some applications so we can continuously upgrade the internal WebRTC code with the latest upstream updates."

Canonical instance: Meta × libwebrtc (2026-04-09)

  • Fork divergence: years behind upstream Chromium libwebrtc at peak.
  • Migration scope: 50+ RTC use cases.
  • Version progression: launched webrtc/latest at M120 (2022-ish); currently at M145 (2025-ish). "Years behind" → "living at head" in ~3 years.
  • Outcomes per-app (some range, depending on the app):
    • Up to 10% CPU drop from upstream's efficiency improvements.
    • Up to 3% crash-rate improvement.
    • 100–200 KB compressed binary-size reduction.
    • Deprecated libraries (e.g. usrsctp) retired with the legacy stack.
  • Ongoing shim use: apps that continue on the shim get each new upstream release A/B tested against the prior release.
  • Residual internal patches: tracked via patterns/external-feature-branch-repo-for-monorepo-patches on a separate Git repo, submit-ready upstream.

Why not just upstream everything and go

patterns/upstream-the-fix is the default escape route and works when all the fork's value can be upstreamed. Meta used it successfully for the FFmpeg fork (features upstreamed in 6.0-8.0 + 7.0+, then the fork retired). For WebRTC, some fork behavior couldn't be upstreamed immediately (internal components, platform integrations, hardware-specific features) — the dual-stack-then-delete path handles that subset while still eventually converging on upstream.

The two patterns compose naturally: upstream everything you can, dual-stack-A/B-and-delete what you can't, keep the residual via patterns/keep-infrastructure-specific-patches-internal on feature branches.

Consequences

  • Permanent dual-stack-capable infrastructure. The shim becomes a long-lived asset. Plan for maintenance + CI cost.
  • Test matrix doubles during migration. Both flavors must be exercised. After per-consumer fork retirement the test matrix shrinks per consumer.
  • Experiment framework must be trustworthy. The whole migration hinges on the experiment framework flipping the flavor flag correctly per user cohort. A flavor-flag bug can mix legacy + upstream state unpredictably.
  • Delete discipline matters. Without a deliberate per-consumer deletion step, the legacy code sticks around long after it's live-traffic-free, re-creating the fork-divergence problem on a smaller scale. Delete as each migration succeeds.

Seen in

Last updated · 319 distilled / 1,201 read