PATTERN Cited by 1 source

Active / Dormant User Training Split¶

Recommendation systems must serve two populations from the same model: active users with recent short-term signal (views, searches in the last week) and dormant users who haven't returned in weeks or months and thus have only long-term signal (past bookings). A single training example per positive outcome tends to over-fit the active case, producing a model that silently degrades on dormant users. The active/dormant training split fixes this by generating multiple training examples per positive outcome, each simulating a different stage of the user journey — some with full recent-history context, some with only long-term history.

Canonical recipe¶

For each positive outcome (booking, click, conversion) at date T:

Generate N active-user examples at dates T-1, T-2, ..., T-N, each using full history up to that date — short-term + long-term. Mimics the late-stage user who has a rough intent and is comparing.
Generate M dormant-user examples by randomly sampling dates from T-(N+1) ... T-Y_MAX (e.g., up to 365 days before the booking), each using only long-term history (e.g., booking history only — no views, no searches). Mimics the early-stage user who hasn't yet visited the platform for this trip.

Total examples per positive outcome: N + M.

Worked example — Airbnb destination recommendation¶

N = 7 active examples at T-1 ... T-7 using full booking/view/search history.
M = 7 dormant examples sampled from T-8 ... T-365 using booking data only.
14 training examples per booking.
Training-time objective is the same (predict the booked destination); the difference is solely in what history is visible to the model at inference time. (Source: sources/2026-03-12-airbnb-destination-recommendation-transformer)

Why it works¶

Amortizes one label across many user states. The positive outcome (a booking) is a rare, expensive label; this pattern gets more training signal per label by replaying it at multiple simulated user states.
Forces the model to learn a dormant-user prior. If dormant examples only contain booking history, the model must learn to predict destination from long-term signal alone — which is exactly what serving requires when a dormant user returns.
Handles the distribution mismatch at serving. Without this, training data reflects the "moment before conversion" distribution, but serving traffic is heavily skewed toward the "browsing aimlessly" distribution. Active/dormant splitting brings training data closer to the serving mix.

Trade-offs¶

Training-set size grows by N+M× per label — not free; consider batch/replica scaling.
Sampling distribution for dormant windows matters. Uniform over [N+1, Y_MAX] days may over-weight ancient behavior; log-uniform weights recent dormancy more. Airbnb post doesn't specify.
Feature-availability alignment at serving. Whatever features are stripped for dormant training examples must be reliably detectable / strippable at serving time; otherwise there's a training-serving skew. Airbnb's "booking-only" is easy; richer stripping (e.g., "only sessions older than 30 days") is harder.
Doesn't address cold-start (users with zero history). For fully new users, fallback ranking or popularity priors remain necessary — this pattern handles the warm-but-stale middle of the history-depth spectrum.

Seen in¶

systems/airbnb-destination-recommendation — 7 active + 7 dormant = 14 examples per booking; dormant examples use booking history only (Source: sources/2026-03-12-airbnb-destination-recommendation-transformer).

concepts/cold-start — the fully-new-user case this pattern doesn't directly address.
concepts/user-action-as-token — the sequence encoding this pattern feeds.
patterns/ab-test-rollout — how you verify the pattern helps in production rather than just on held-out data.