PATTERN Cited by 1 source
Multi-cadence incremental training¶
Pattern¶
Maintain model freshness by running two nested training loops at different cadences: infrequent full retraining passes on a broad data window, and frequent lightweight incremental updates on recent data mixed with sampled history.
Mechanism¶
- Low-frequency (periodic): Full pretraining + post-training on a wide historical window. Resets/refreshes base knowledge.
- High-frequency (daily): Continue post-training from yesterday's checkpoint on a blend of:
- Latest day's data (captures trends, new catalog items)
- Sampled subset of historical data (prevents catastrophic forgetting)
- New vocabulary tokens (entities, rows) initialized via fallback tokens
Trade-offs¶
| Dimension | Benefit | Cost |
|---|---|---|
| Full retrain | Corrects drift, rebalances embeddings | Expensive (compute, time) |
| Daily incremental | Fast adaptation to trends | Risk of overfitting to recency |
| History mixing | Prevents forgetting | Increases daily training data volume |
Seen in¶
- sources/2026-06-29-netflix-genpage-generative-homepage-construction — Netflix GenPage uses this to keep a 200M+ parameter recommender fresh without daily from-scratch retraining