Skip to content

PATTERN Cited by 1 source

Config separated from code via pub/sub

Config separated from code via pub/sub is the pattern of publishing operational configuration (routing rules, A/B test definitions, traffic splits, feature flags) as versioned artifacts on a pub/sub substrate, with multiple consumers subscribing independently. Code deploys and config changes have independent release cycles; versioning + dynamic loading + rollback are inherited from the pub/sub layer.

Canonicalised on the wiki by Netflix's 2026-05-01 Model-Serving routing post, where Switchboard Rules (JavaScript → JSON, describing Objective → model bindings, A/B cell rules, traffic splits) are published via Gutenberg and subscribed independently by both the routing service (Switchboard, then Lightbulb) and the serving cluster hosts.

The shape

[researcher authors]       [config artifact]          [consumers]
   JavaScript       -->     Published JSON      --+-->  Routing service
   rule fn          -->     (versioned,           |      (Switchboard/
                            dynamically           |       Lightbulb)
                            loadable,             |
                            rollback-able)        +-->  Serving cluster
                             via Gutenberg               hosts

Quoting the post:

"Netflix's Gutenberg system provides an excellent ecosystem that enables a flexible pub-sub architecture, facilitating proper versioning, dynamic loading, easy rollbacks, and more. Both Switchboard and the Serving Cluster Host subscribe to the same Switchboard Rules configuration."

Load-bearing benefits

  1. Independent release cycle for config. Research teams roll out A/B experiments, change cell-to-model mappings, shift canary traffic — without waiting for a platform code deploy.
  2. Dynamic loading. Consumers (routing service + serving hosts) load new rule sets in-process; no restart, no cold-start.
  3. Rollback is free. Bad rule set → revert to the last-known-good version via the pub/sub substrate's versioning; every consumer picks up the revert on its next subscription tick.
  4. Multiple consumers, one rule source. Both the routing service and the serving hosts subscribe to the same config stream, eliminating drift between "what the router thinks should run" and "what the host is ready to run".
  5. Natural audit trail. Pub/sub substrates typically record producer identity + timestamp + version per publish, giving operators an audit log of config evolution at no additional cost.

Researcher-facing authoring

Netflix's implementation uses JavaScript for rule authoring; example:

function defineAB12345Rule() {
  const abTestId = 12345;
  const objectives = Objectives.ContinueWatchingRanking;
  const abTestCellToModel = {
    1: {name: "netflix-continue-watching-model-default"},
    2: {name: "netflix-continue-watching-model-cell-2"},
    3: {name: "netflix-continue-watching-model-cell-3"}
  };
  return {
    cellToModel: abTestCellToModel,
    abTestId: abTestId,
    targetObjectives: [objectives],
    modelInputType: constants.TITLE_INPUT_TYPE,
    modelType: "SCORER"
  };
}

The JavaScript compiles to the JSON rule set that Gutenberg publishes. JavaScript + JSON is a Netflix choice; the pattern is agnostic to the authoring language (YAML-declared rules, OPA policies, OpenPolicyAgent bundles, HCL, etc. all work).

Sync discipline

One hazard: multiple consumers receiving rule updates at different times can create transient inconsistencies ("router routes to model X, host doesn't have X loaded yet"). Netflix names this explicitly:

"To prevent race conditions and ensure proper sync of the dynamic Switchboard Rules configuration, the following flow is considered" (diagram in the post, not textually reproduced).

The pattern's substrate must therefore provide ordered delivery or consumers must implement a two-phase acknowledgement of rule versions (load, validate, acknowledge, activate). Gutenberg's native ordering provides the former.

When this pattern applies

  • Routing / experiment / feature-flag config. Changes frequently (multiple times per day), changes are read by many runtime consumers, needs versioning + rollback.
  • Policy bundles. See patterns/s3-as-policy-bundle-source-for-availability for a sibling pattern that uses object-storage as the substrate rather than pub/sub.
  • Model-lifecycle state. Which models are promoted, which are shadowed, which are retired. Needs broad dissemination.
  • Multi-stack config. Config that spans multiple services (router + serving hosts; sidecar + app; edge + origin) where drift would cause correctness bugs.

When this pattern does not apply

  • Small deployments. One service + one config file + one restart is simpler than a pub/sub substrate.
  • Very high-frequency config (sub-second). Pub/sub dissemination latency may be too slow; local feature flags in a shared-memory map are faster.
  • Config that needs strict transactional semantics across consumers. Pub/sub is eventually consistent across subscribers unless layered with a commit-coordinator; if your correctness requires "all consumers apply the new rule at the same instant", this pattern alone is insufficient.

Seen in

  • sources/2026-05-01-netflix-state-of-routing-in-model-serving — canonical wiki instance. Switchboard Rules (JavaScript → JSON) published via Gutenberg, subscribed by both the routing service and the serving hosts. Load-bearing for Netflix's ability to roll out A/B experiments without code deploys.
Last updated · 445 distilled / 1,275 read