CONCEPT Cited by 1 source
Digital-twin backtesting¶
Digital-twin backtesting is the technique of running counterfactual or evaluation workloads against a high-fidelity simulated replica of a production system, seeded with real production state. The twin "replicates the state of real-world clusters" (or databases, networks, fleets) closely enough that answers produced inside the twin are treated as authoritative for the question being asked — even though no production traffic is touched.
Typical uses¶
- Training-data generation for ML models that predict
production behaviour. Run the authoritative solver /
simulator inside the twin across many scenarios; harvest
(input, output)pairs for supervised learning. - Counterfactual policy evaluation. Before rolling out a scheduler / placement / pricing change, replay historical state inside the twin under the new policy; compare outcomes to the actual history.
- Pre-deployment validation. Stress-test a proposed change against the richest available production-like state without exposing real customers.
- Fallback ground truth. Pair with a cheap ML approximator; when the approximator is uncertain, invoke the twin's authoritative solver instead.
Why it's distinct from "simulation"¶
- A general-purpose simulator can be too abstract to be trusted as ground truth — it encodes the modeller's assumptions, not production reality.
- A digital twin is specifically seeded with real state — the same inputs the production system saw — and the twin's fidelity is validated against production outcomes. It's the replication of production state that makes outputs trustworthy.
Canonical wiki instance¶
The 2025-07-29 Google Research post names Google's Borg digital twin as "a sophisticated backtesting framework to replicate the state of real-world clusters." The twin is used to:
- Run Borg's specialised bin-packing algorithm against real cluster state to produce the ground-truth MIPS per GCU values that serve as RLM training targets.
- Sit on the expensive-fallback side of the fast-path/ slow-path deployment — when the RLM is uncertain, the slow bin-packing run inside the twin is the authoritative answer (Source: sources/2025-07-29-google-simulating-large-systems-with-regression-language-models).
The post does not describe the twin's internals — it's used as a black box whose outputs are accepted as ground truth.
Prerequisites¶
- Reproducible production state. The twin has to ingest real cluster / database / network state at sufficient detail that the authoritative solver's output matches production's.
- Authoritative solver / policy. The twin is only useful if it runs the same scheduler / cost model / pricing engine as production. A twin that runs a different algorithm answers a different question.
- Scale discipline. At Google cluster scale, backtesting at production fidelity is itself an expensive compute workload — not free.
Contrast with¶
- Deterministic simulation is a testing discipline that replaces the scheduler / network / time with a seeded PRNG for reproducibility — the point is deterministic-under-adversarial-schedules, not production-fidelity.
- Shadow traffic mirrors live requests to a candidate implementation. Digital-twin backtesting uses historical state, not live traffic, and runs the authoritative solver, not the candidate.
- Replay testing replays a recorded workload against a candidate — same shape at a smaller scale.
Seen in¶
- sources/2025-07-29-google-simulating-large-systems-with-regression-language-models — Google's Borg digital twin used as training-data source for the RLM and implied fallback for high-uncertainty predictions.