CONCEPT

Shadow-mode metric collection¶

Definition¶

Shadow-mode metric collection is the practice of running a candidate integration (new API, new downstream, new code path) in parallel to the existing production path — on a separate thread pool, with mirrored traffic — without affecting production responses. The candidate's latency, error, and success metrics are collected under realistic load, giving the integrator a trustworthy p50 / p99 / p99.9 distribution before production cut-over.

The Zalando timeouts post names it as the prerequisite for sizing request timeouts:

"If possible, run an integration with the new API in shadow mode and collect metrics. This code should run parallel to the existing production integration, but without affecting the production system (run it in a separate thread-pool, mirror traffic, etc). After collecting latency metrics such as p50, p99, p99.9 you can define the so-called acceptable rate of false timeouts." (Source: )

Why shadow mode rather than the SLA document¶

Downstream SLAs are a starting point but not a trustworthy timeout value — the Zalando post is explicit on this:

"Not all services provide SLAs and even if they do you should not trust blindly. The SLA value is good enough only for starting to test real latency."

Reasons the SLA drifts from reality:

SLAs are contractual floors, not distributional statements; the downstream may routinely beat its SLA at p50 and still tail past it at p99.9.
SLAs are aggregate across all callers; your caller's traffic shape (keys, geography, payload size) may produce very different percentiles.
SLAs are often slow to update when architecture changes.

Shadow-mode collection replaces inference-from-SLA with measurement-under-realistic-load.

The three isolations¶

A correct shadow deployment isolates the candidate from prod on three axes:

Thread / resource isolation — separate thread pool so a slow candidate never starves the production path's capacity.
Response isolation — candidate results are discarded; production responses come from the existing path.
Failure isolation — candidate exceptions do not propagate to the request or the user.

Missing any one of the three turns "shadow mode" into an accidental production cut-over.

Relationship to the parallel-run pattern¶

Shadow-mode metric collection is the measurement-only variant of parallel run. Parallel run is the broader pattern family for production-validation of a replacement: shadow-mode collects metrics only; compare mode also verifies response equivalence against the production path; live mode serves from the candidate but keeps the legacy path as fallback.

Operational caveats¶

Request doubling — every production request becomes two backend calls (one to prod, one to candidate). This doubles load on shared state (caches, databases) unless the candidate's calls are deduplicated or gated. See concepts/parallel-run-request-doubling.
Non-idempotent side effects — mirror reads are safe; mirror writes can create duplicates, corrupt state, or double-charge users. Shadow mode is intended for reads only unless the candidate has an explicit idempotency contract.
Timing sensitivity — candidate runs concurrently with prod but responses are discarded; any latency measurement must account for the fact that shadow traffic competes for downstream resources, which can subtly inflate measured latency.

Seen in¶

— canonical Zalando framing as the metric-collection discipline behind principled request-timeout sizing.

concepts/request-timeout — the downstream decision that shadow-mode measurement informs.
concepts/false-timeout-rate — the tunable whose target dictates which percentile to read off the shadow distribution.
concepts/parallel-run-request-doubling — the load- multiplication caveat.
patterns/parallel-run-pattern — the broader pattern family (shadow mode is its measurement-only subset).
patterns/explicit-timeout-on-remote-calls — shadow mode is how you discover the right number to set.