PATTERN Cited by 1 source
Active / Dormant User Training Split¶
Recommendation systems must serve two populations from the same model: active users with recent short-term signal (views, searches in the last week) and dormant users who haven't returned in weeks or months and thus have only long-term signal (past bookings). A single training example per positive outcome tends to over-fit the active case, producing a model that silently degrades on dormant users. The active/dormant training split fixes this by generating multiple training examples per positive outcome, each simulating a different stage of the user journey — some with full recent-history context, some with only long-term history.
Canonical recipe¶
For each positive outcome (booking, click, conversion) at date T:
- Generate N active-user examples at dates
T-1, T-2, ..., T-N, each using full history up to that date — short-term + long-term. Mimics the late-stage user who has a rough intent and is comparing. - Generate M dormant-user examples by randomly sampling dates
from
T-(N+1) ... T-Y_MAX(e.g., up to 365 days before the booking), each using only long-term history (e.g., booking history only — no views, no searches). Mimics the early-stage user who hasn't yet visited the platform for this trip.
Total examples per positive outcome: N + M.
Worked example — Airbnb destination recommendation¶
N = 7active examples atT-1 ... T-7using full booking/view/search history.M = 7dormant examples sampled fromT-8 ... T-365using booking data only.- 14 training examples per booking.
- Training-time objective is the same (predict the booked destination); the difference is solely in what history is visible to the model at inference time. (Source: sources/2026-03-12-airbnb-destination-recommendation-transformer)
Why it works¶
- Amortizes one label across many user states. The positive outcome (a booking) is a rare, expensive label; this pattern gets more training signal per label by replaying it at multiple simulated user states.
- Forces the model to learn a dormant-user prior. If dormant examples only contain booking history, the model must learn to predict destination from long-term signal alone — which is exactly what serving requires when a dormant user returns.
- Handles the distribution mismatch at serving. Without this, training data reflects the "moment before conversion" distribution, but serving traffic is heavily skewed toward the "browsing aimlessly" distribution. Active/dormant splitting brings training data closer to the serving mix.
Trade-offs¶
- Training-set size grows by
N+M×per label — not free; consider batch/replica scaling. - Sampling distribution for dormant windows matters. Uniform
over
[N+1, Y_MAX]days may over-weight ancient behavior; log-uniform weights recent dormancy more. Airbnb post doesn't specify. - Feature-availability alignment at serving. Whatever features are stripped for dormant training examples must be reliably detectable / strippable at serving time; otherwise there's a training-serving skew. Airbnb's "booking-only" is easy; richer stripping (e.g., "only sessions older than 30 days") is harder.
- Doesn't address cold-start (users with zero history). For fully new users, fallback ranking or popularity priors remain necessary — this pattern handles the warm-but-stale middle of the history-depth spectrum.
Seen in¶
- systems/airbnb-destination-recommendation — 7 active + 7 dormant = 14 examples per booking; dormant examples use booking history only (Source: sources/2026-03-12-airbnb-destination-recommendation-transformer).
Related¶
- concepts/cold-start — the fully-new-user case this pattern doesn't directly address.
- concepts/user-action-as-token — the sequence encoding this pattern feeds.
- patterns/ab-test-rollout — how you verify the pattern helps in production rather than just on held-out data.