Skip to content

PATTERN Cited by 1 source

Multi-cadence incremental training

Pattern

Maintain model freshness by running two nested training loops at different cadences: infrequent full retraining passes on a broad data window, and frequent lightweight incremental updates on recent data mixed with sampled history.

Mechanism

  1. Low-frequency (periodic): Full pretraining + post-training on a wide historical window. Resets/refreshes base knowledge.
  2. High-frequency (daily): Continue post-training from yesterday's checkpoint on a blend of:
  3. Latest day's data (captures trends, new catalog items)
  4. Sampled subset of historical data (prevents catastrophic forgetting)
  5. New vocabulary tokens (entities, rows) initialized via fallback tokens

Trade-offs

Dimension Benefit Cost
Full retrain Corrects drift, rebalances embeddings Expensive (compute, time)
Daily incremental Fast adaptation to trends Risk of overfitting to recency
History mixing Prevents forgetting Increases daily training data volume

Seen in

Last updated · 560 distilled / 1,653 read