PATTERN Cited by 1 source
Gradual API-surface rollout¶
When a rewrite affects behaviour that is consumed through multiple API surfaces (GraphQL, REST, first-party UI, third-party integrations, command-line / SDK clients), bound blast radius by limiting the set of surfaces the new code path serves, not just the fraction of traffic on any single surface. Add new surfaces only after the previous ones are clean.
Specialisation of patterns/staged-rollout — staged rollout usually refers to per-surface percent-of-traffic + environment rollouts; this pattern is an across-surface compose on top.
Shape¶
- Enumerate the code paths that will call the new implementation.
- Rank them by user-visible risk of a regression:
- UI surfaces where users can re-author their input on the fly are low-risk (people can work around).
- GraphQL APIs consumed programmatically are medium-risk (integrators see regressions but usually have versioning tools).
- REST APIs with stable documented contracts + long-lived client code are high-risk (the regression outlives the deploy and outlives the caller's awareness).
- First-party dashboards / default-home-page-style surfaces are very high-risk (regressions visible to every logged-in user immediately).
- Integrate one surface at a time, with its own feature-flag + percent rollout inside that surface. Confirm metrics + diff harnesses (dark-ship, scientist) on each surface before advancing to the next.
- Fast-rollback per surface, not globally. Each surface owns its flag; a regression on one surface doesn't roll the others back.
- Stop advancing if an earlier surface resurfaces a bug. Depth-first: fix the current surface's regressions before extending.
Canonical instance: GitHub Issues search¶
Rollout order, ~Mar–Apr 2025:
- GraphQL API + per-repo Issues tab UI — first. Users bookmark repo-scoped Issues URLs less often than account- wide dashboards; GraphQL has versioning affordances.
- Issues dashboard — extended after GraphQL + repo-UI ran clean (2025-04-02 changelog).
- REST API — extended after the dashboard ran clean (2025-03-06 changelog).
Note that the dates in the changelogs are out-of-order from the "first API integration" narrative because different surfaces ship their GA posts on different cadences — the behaviour went live behind a flag earlier than the changelog for each surface.
The post names the intent explicitly:
"To gradually build confidence, we only integrated the new system in the GraphQL API and the Issues tab for a repository in the UI to start. This gave us time to collect, respond to, and incorporate feedback without risking a degraded experience for all consumers. Once we were happy with its performance, we rolled it out to the Issues dashboard and the REST API."
(Source: sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean)
Why this is distinct from plain staged rollout¶
- Staged rollout inside one surface controls what fraction of requests to that surface see the new code.
- Surface-first rollout controls which code paths can invoke the new code at all.
They compose. You can do 1% → 10% → 50% → 100% on GraphQL, then repeat on REST. The composition halves the blast radius compared to rolling both in parallel at the same percent tiers.
Trade-offs¶
- Temporary API-surface behaviour skew. While one surface has the new code and another doesn't, a caller observing both will see different behaviours for the same logical input. For searches this is usually acceptable (the query language is the same, just the support surface differs). For bug-fix rewrites this is not acceptable — fix everywhere at once or announce the skew.
- Multiplies the rollout timeline. Sequential surface integration is slower than parallel.
- Feature-flag count grows with surface count. An
enable_conditional_issues_queryflag per surface is usually fine; avoid a single global flag that flips all surfaces at once. - Observability must be per-surface. If your metrics are global (all-surface combined), you can't tell which surface's rollout caused a regression. Tag metrics by surface before starting.
When this applies vs plain staged rollout¶
- Staged rollout is about users you expose to the new code. Use when the surface-set is single or uniform.
- Surface-first rollout is about integration points that can trigger the new code. Use when the surfaces have distinct caller populations + risk profiles and the code path is shared.
- In practice, large rewrites use both — compose them.
Seen in¶
- sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean — canonical in-wiki instance. GitHub's Issues-search rewrite shipped to GraphQL + per-repo UI first, extended to Issues dashboard, then REST API, with independent confidence gates at each surface.
Related¶
- patterns/staged-rollout — the per-surface inner loop.
- patterns/cohort-percentage-rollout — specialisation when per-user risk varies structurally within one surface.
- patterns/fast-rollback — each surface owns its own rollback lever.
- concepts/blast-radius — the thing this pattern bounds.