Skip to content

CONCEPT Cited by 1 source

Dynamic sampling rate tuning

Dynamic sampling rate tuning is a feedback-control loop that adjusts a sampling profiler's run probability (or rate) per-service, per-host, or per-profiler, on a regular cadence, to hit a desired sample-count target (e.g. "40,000 CPU-cycles samples per hour for this service") without starving larger or over-sampling smaller fleets.

It is the mechanism that makes default continuous profiling tractable at hyperscale: the profiler owner states "I want N samples/hour for service X", and the platform auto-tunes how often it runs on each host so the target is met regardless of how many hosts X runs on or how CPU-intensive X actually is.

The mechanism (Strobelight canonical form)

From Meta's Strobelight post (Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology):

  1. Config expresses a target. "A service, named Soft Server, runs on 1,000 hosts and let's say we want profiler A to gather 40,000 CPU-cycles samples per hour for this service."
  2. Start conservative. Strobelight knows host count but not workload intensity, so it starts with a low "run probability" — a cold-start safety default.
  3. Observe. "The next day Strobelight will look at how many samples it was able to gather for this service."
  4. Re-tune. "Automatically tune the run probability (with some very simple math) to try to hit 40,000 samples per hour."
  5. Iterate. Runs daily, per-service.

Weight-based aggregation (the load-bearing companion)

A naive rate-variable profiler can't be aggregated meaningfully: if Service A sampled 2× as often on host H1 as on host H2, then summing samples across H1 + H2 gives a biased estimate.

Strobelight attaches a per-sample weight equal to the inverse of the run probability at sample time:

"Since Strobelight is aware of all these different knobs for profile tuning, it adjusts the 'weight' of a profile sample when it's logged. A sample's weight is used to normalize the data and prevent bias when analyzing or viewing this data in aggregate. So even if Strobelight is profiling Soft Server less often on one host than on another, the samples can be accurately compared and grouped. This also works for comparing two different services."

(Source: sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology)

Weight-based aggregation is what makes cross-service "horizontal wins" analytically feasible — a performance engineer can query "which std::vector copies are hottest fleet-wide" and get a valid answer even though the vector appears in different services that Strobelight samples at different rates.

Why the naive alternatives fail

  • Fixed rate on every host — either expensive on fat services or statistically thin on small ones.
  • Constant-time-budget per host — OK for per-host profiles, biased at fleet scale.
  • Always-on 100% — unacceptable overhead; defeats the point of statistical profiling.

The desired-count + observation + daily re-tune + weight-at-emit loop is the stable resolution.

Seen in

Last updated · 550 distilled / 1,221 read