SYSTEM Cited by 1 source

scientist (github/scientist)¶

scientist (https://github.com/github/scientist) is GitHub's open-source Ruby library for carefully refactoring critical paths by running experimental code alongside trusted code. The canonical idiom:

science "widget-permissions" do |e|
  e.use     { original_implementation }   # "control"
  e.try     { new_implementation }        # "candidate"
end

The control branch runs and its result is returned to the caller. The candidate branch runs in the same request (Ruby picks the execution order pseudo-randomly to avoid ordering bias); its result, timing, and any raised exception are captured. scientist then publishes a comparison — observed values, durations, diff flag — to a publisher the caller configures (logs, a metrics backend, etc). Candidate exceptions are swallowed before they reach the user so the experiment can't break production.

GitHub has used scientist on refactors of its core permissions system, Git backends, PR merge code, and other load-bearing paths. The library has ports to many languages (Python, Node, Go, Java, etc.), all following the same control/candidate/publish shape.

Role in the wiki¶

GitHub Issues search performance comparison (2025-05-13)¶

The 2025 Issues-search rewrite used scientist to compare performance of equivalent queries between the old flat-parser path and the new PEG-AST-recursive path on 1% of production traffic:

"For 1% of Issue searches, we ran equivalent queries against both the existing and the new search systems. We used scientist, GitHub's open source Ruby library, for carefully refactoring critical paths, to compare the performance of equivalent queries to ensure that there was no regression."

Complementary to the dark-ship diff harness that ran on a different 1% slice of traffic and logged result-count differences (see patterns/dark-ship-for-behavior-parity):

dark-ship → behaviour parity (did the result set change?)
scientist → performance parity (did latency regress? did the new path raise?)

Either harness alone misses one half; GitHub ran both. (Source: sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean)

Why scientist vs a hand-rolled A/B¶

The post names scientist specifically — out of all the possible ways to compare two code paths — because it bakes in defaults that otherwise have to be reinvented on every refactor:

Candidate-exception isolation: if the new path raises, the user sees the control result; the experiment records the exception rather than breaking production.
Random execution order: eliminates systemic bias from warm-cache / cold-cache effects on the first branch to run.
Pluggable publisher: comparison data goes wherever the team already sends metrics; no new dashboard needed.
Sampling hook: e.run_if { rand < 0.01 } makes the 1% sample a single-line concern, not a custom harness.

Caveats¶

scientist runs both branches in-process, synchronously — it approximately doubles the critical-path work for the sampled fraction of requests. Expectation is 1–5% sampling; 100% would likely exceed latency budgets. (GitHub's 1% is the typical figure.)
scientist is for read-path refactors, where running both branches is side-effect-free. For write paths, use patterns/dual-write-migration or a shadow queue instead — scientist doesn't ship write-path isolation.
scientist surfaces observations, not causal root-cause analysis. Someone still has to triage the diff stream.

Seen in¶

sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean — GitHub's Issues-search rewrite uses scientist to compare old vs new query latency on 1% of production search traffic, paired with a separate dark-ship diff harness on a different 1% for behaviour parity.

patterns/performance-comparison-with-scientist — the generalized pattern.
patterns/dark-ship-for-behavior-parity — companion harness focused on result diffs rather than perf.
systems/github-issues — canonical wiki-ingested production consumer.