PATTERN Cited by 1 source
Forecast → Estimate → Plan (three-component capacity planner)¶
Decompose predictive-capacity-planning into three independent components: a Forecaster that predicts exogenous customer demand, an Estimator that maps forecasted demand and a candidate instance size to expected resource utilisation, and a Planner that selects the cheapest instance size meeting the capacity target at the expected utilisation. Each component is independently trainable and replaceable.
The canonical wiki instance is MongoDB's 2023 predictive auto-scaling prototype (Source: sources/2026-04-07-mongodb-predictive-auto-scaling-an-experiment):
┌────────────┐ ┌────────────┐ ┌─────────┐
customer → │ Forecaster │ QPS │ Estimator │ CPU% │ Planner │ → new tier
demand │ │─────>│ (demand× │─────>│ (cheap │
history │ (MSTL + │ │ size → CPU)│ │ fits) │
│ ARIMA) │ │ │ │ │
└────────────┘ └────────────┘ └─────────┘
Intent¶
Predictive capacity planning must answer one question:
"What size should we run this workload on in the next T minutes so that it's not overloaded and not over-provisioned?"
The temptation is to train a single model directly on
(state_now) → optimal_size. This fails for three reasons:
- Self-invalidation — if the output depends on a metric (CPU) that changes when the action is taken, the training data lies.
- Joint training over two variables (demand ahead + size choice) requires labels for every size at every time point, which don't exist unless you simulate.
- Non-independent retraining — the cadence for re-fitting the demand model (minutes, new observations constantly) is different from the cadence for re-fitting the size↔CPU model (rare, only when hardware / server version changes). One model can't have two retraining cadences.
The three-component decomposition resolves all three. Each component is independently trained, independently validated, and independently retrainable.
Components¶
Forecaster¶
- Input: history of customer-driven metrics for this workload (QPS, connection count, scanned-objects rate) — exogenous, so forecast-safe.
- Output: predicted values over the next T minutes.
- Training data: per-workload history (several weeks).
- Retraining cadence: "every few minutes, as new samples arrive" (MongoDB).
- Typical model: time-series decomposition + residual noise model. MongoDB uses MSTL + ARIMA for long-term and trend interpolation for short-term — see patterns/short-plus-long-term-forecaster.
Estimator¶
- Input: forecasted demand + candidate instance size.
- Output: expected CPU utilisation (or the bottleneck resource for this workload).
- Training data:
(demand, size, CPU)triples across the fleet. MongoDB used 25 million samples across a random subset of replica sets. - Retraining cadence: rare — only when hardware or server version changes.
- Typical model: boosted decision trees (MongoDB); linear or quantile regression also appropriate. Per-server-version estimator is the stated roadmap direction.
Planner¶
- Input: Estimator's
(forecasted_demand × every candidate size) → CPUgrid, plus cost per size, plus CPU ceiling. - Output: the cheapest instance size whose projected CPU stays below the ceiling.
- Decision logic: constrained min-cost — "cheapest instance size that we guess can handle the next 15 minutes of customer demand without exceeding 75% CPU."
- Retraining: none required (logic, not learned).
Why the three-way split pays¶
The split makes each piece:
- Independently debuggable — Forecaster wrong? Look at
held-out demand prediction error. Estimator wrong? Look at
(demand, size) → CPUresiduals on a log of real workload moments. Planner wrong? Look at constraint-satisfaction + cost-minimisation. - Independently replaceable — swap MSTL for an RNN forecaster without touching the Estimator. Swap boosted trees for a neural Estimator without touching the Forecaster. Change the CPU ceiling from 75% to 70% in the Planner alone.
- Independently retrainable — fast-cadence Forecaster updates don't force slow-cadence Estimator retraining.
- Independently gated — MongoDB's self-censoring gate applies at the Forecaster stage only; the Estimator runs always.
Planner objective tuning¶
The Planner is where architectural risk preferences surface:
- CPU ceiling — MongoDB uses 75%. Lower ceiling = more headroom, higher cost, more conservative. Higher ceiling = tighter packing, cheaper, more risk of overrun.
- Horizon — MongoDB uses 15 minutes. Shorter horizon = less risk of forecast drift, more scaling churn. Longer = less churn, more drift risk.
- Cost metric — unit price per hour is the standard;
customers paying usage-metric pricing might minimise
(cost × time) ~ provisioned-capacity × utilizationinstead. - Asymmetric risk — scale-up penalty < scale-down penalty often leads to scale-up-only variants (MongoDB's 2025 production rollout).
When the pattern fits¶
- Predictable demand (seasonality or short-term trend).
- Substantial scaling latency (minutes to tens-of-minutes) that makes reactive scaling structurally too slow.
- Discrete, ordered instance-size catalog (the tier abstraction).
- Exogenous inputs available — customer-driven metrics decoupled from the control action.
- Enough per-workload history to train a forecaster (cold-start workloads use reactive until history accumulates).
When it doesn't fit¶
- Unpredictable demand — no seasonality, no trend; the Forecaster is always wrong; reactive dominates.
- Short-spike pattern shorter than scaling op — forecasting gains you nothing at the scaling horizon; pattern calls for in-place absorption (batching, caching) instead.
- Workloads the Estimator can't learn — MongoDB excluded 13% of replica sets on Estimator error grounds.
- Zero-history workloads — newly-created clusters.
Relationship to other patterns¶
- patterns/short-plus-long-term-forecaster — a refinement of the Forecaster component.
- patterns/cheap-approximator-with-expensive-fallback — different variable: forecast-and-act uses the prediction always; cheap-approximator gates use vs fallback. Same calibration discipline; different control topology.
- patterns/learned-distribution-over-point-prediction — would be the natural upgrade path for the Estimator (emit distribution, Planner picks percentile matched to risk). MongoDB's prototype stays on point predictions.
Seen in¶
- sources/2026-04-07-mongodb-predictive-auto-scaling-an-experiment — MongoDB's 2023 prototype for Atlas replica-set auto-scaling; canonical wiki instance. Forecaster = MSTL + ARIMA. Estimator = boosted decision trees on 25M samples. Planner = cheapest tier with ≤75% CPU over 15-min horizon.
Related¶
- patterns/short-plus-long-term-forecaster — Forecaster refinement.
- patterns/cheap-approximator-with-expensive-fallback — sibling control-loop structure.
- concepts/predictive-autoscaling — the concept realised.
- concepts/customer-driven-metrics — the Forecaster's input.
- concepts/tier-based-instance-sizing — the Planner's output space.
- concepts/self-censoring-forecast — the gate at the Forecaster stage.
- systems/mongodb-atlas — canonical wiki deployment.