Skip to content

PATTERN Cited by 1 source

Short-term + long-term forecaster (two-forecaster architecture)

Deploy two forecasters on the same metric at different timescales, with the long-term forecaster self-gating so the short-term forecaster absorbs the non-seasonal case. The long-term forecaster handles workloads with detectable seasonality (daily or weekly) for hours-ahead prediction; the short-term forecaster handles non-seasonal workloads via trend interpolation on the last 1-2 hours. A selector chooses based on the long-term forecaster's self-censoring confidence gate.

Canonical wiki instance: MongoDB Atlas's 2023 predictive auto-scaling prototype (Source: sources/2026-04-07-mongodb-predictive-auto-scaling-an-experiment).

Intent

A single forecaster fails for one of two reasons on every heterogeneous fleet:

  1. Long-only (MSTL + ARIMA) needs weeks of history and seasonality. Non-seasonal workloads get garbage predictions → untrusted output → fallback to pure reactive.
  2. Short-only (trend interpolation) misses known daily / weekly structure → over-scales on the Friday-batch-job climb instead of pre-scaling to the known pattern.

The two-forecaster shape keeps both signals alive and selects at runtime based on which is actually working for this workload right now.

Structure

                 ┌─────────────────────────┐
                 │  Long-Term Forecaster   │
                 │  (MSTL + ARIMA,         │
  demand         │   weeks of data,        │  hours-ahead
  history  ───>  │   hours ahead)          │  forecast
                 │                         │
                 │  Self-censor on         │ ───┐
                 │  recent accuracy        │    │
                 └─────────────────────────┘    │
                                           ┌─────────┐
                                           │ Selector│ ──> chosen forecast
                                           └─────────┘
                 ┌─────────────────────────┐    │
                 │  Short-Term Forecaster  │    │
  recent hour    │  (trend interpolation,  │ ───┘
  of demand ───> │   1–2 hours of data,    │
                 │   minutes ahead)        │
                 │                         │
                 └─────────────────────────┘

Components

Long-Term Forecaster

  • Algorithm: MSTL (Multi-Seasonal Trend decomposition using LOESS) + ARIMA residual model. Decomposes series into trend + daily-seasonal + weekly-seasonal + residual; forecasts each component; recomposes.
  • Training window: several weeks of per-workload history.
  • Forecast horizon: "a few hours ahead" (MongoDB).
  • Retraining: every few minutes as new samples arrive.
  • When it works: workloads with detectable seasonality (MongoDB: most Atlas replica sets have daily seasonality; 25% have weekly).
  • When it fails: non-seasonal workloads or short-history workloads. The self-censoring gate catches this.

Short-Term Forecaster

  • Algorithm: trend interpolation over the last 1-2 hours of data. MongoDB reports it beats naïve last-observation 68% of the time with 29% error reduction.
  • Training window: 1-2 hours.
  • Forecast horizon: minutes (matches scaling operation horizon).
  • Retraining: continuous, on latest observations.
  • When it works: any non-seasonal workload with even weak short-term momentum.
  • When it fails: purely random workloads (but here even reactive has no better option).

Selector

  • If long-term self-censor clears → use long-term. Trustable seasonal signal with hours-ahead horizon beats extrapolating the last hour.
  • Else → use short-term. Falls back to local trend when seasonal signal is absent.
  • MongoDB's stated rationale: "we didn't want to fall back to purely-reactive scaling; we can still do better than that."

Why both, not just one

Two-forecaster is the minimal architecture to cover the full workload distribution MongoDB observed:

Workload shape % of replica sets Best forecaster
Daily + weekly seasonality ~25% Long-term
Daily seasonality only ~50-60% Long-term
Non-seasonal, trended ? Short-term
Non-seasonal, untrended ? Reactive backstop

The bottom tier is the reactive baseline; the two forecasters cover the top three tiers with different mechanisms and pick per-workload at runtime via the self-censoring gate.

Anti-patterns

  • Single forecaster with implicit seasonality detection — harder to debug and can silently miss seasonality it should catch; no natural confidence signal.
  • Average the two forecasts — mixes a strong signal with a weak one; performs worse than either alone on their home turf. Selection > ensembling in this context because the long-term is either confident or uselessly wrong, not noisily wrong.
  • Always use short-term — gives up hours-ahead seasonal predictions on the ~25% weekly-seasonal workloads.
  • Always use long-term — predicts garbage on non-seasonal workloads and trashes the downstream scaling action.

Generalisations

The two-timescale shape appears in adjacent forecasting contexts:

  • Capacity planning — daily seasonality + short-term burst (MongoDB's case).
  • Cache-eviction prediction — long-term popularity + short-term recency.
  • Ad-bid pacing — budget trajectory (long) + current auction context (short).

The generalisation: two forecasters with different training windows and different strengths, selected per-moment by a confidence gate.

Seen in

Last updated · 200 distilled / 1,178 read