Skip to content

PATTERN Cited by 1 source

Active / Dormant User Training Split

Recommendation systems must serve two populations from the same model: active users with recent short-term signal (views, searches in the last week) and dormant users who haven't returned in weeks or months and thus have only long-term signal (past bookings). A single training example per positive outcome tends to over-fit the active case, producing a model that silently degrades on dormant users. The active/dormant training split fixes this by generating multiple training examples per positive outcome, each simulating a different stage of the user journey — some with full recent-history context, some with only long-term history.

Canonical recipe

For each positive outcome (booking, click, conversion) at date T:

  1. Generate N active-user examples at dates T-1, T-2, ..., T-N, each using full history up to that date — short-term + long-term. Mimics the late-stage user who has a rough intent and is comparing.
  2. Generate M dormant-user examples by randomly sampling dates from T-(N+1) ... T-Y_MAX (e.g., up to 365 days before the booking), each using only long-term history (e.g., booking history only — no views, no searches). Mimics the early-stage user who hasn't yet visited the platform for this trip.

Total examples per positive outcome: N + M.

Worked example — Airbnb destination recommendation

  • N = 7 active examples at T-1 ... T-7 using full booking/view/search history.
  • M = 7 dormant examples sampled from T-8 ... T-365 using booking data only.
  • 14 training examples per booking.
  • Training-time objective is the same (predict the booked destination); the difference is solely in what history is visible to the model at inference time. (Source: sources/2026-03-12-airbnb-destination-recommendation-transformer)

Why it works

  • Amortizes one label across many user states. The positive outcome (a booking) is a rare, expensive label; this pattern gets more training signal per label by replaying it at multiple simulated user states.
  • Forces the model to learn a dormant-user prior. If dormant examples only contain booking history, the model must learn to predict destination from long-term signal alone — which is exactly what serving requires when a dormant user returns.
  • Handles the distribution mismatch at serving. Without this, training data reflects the "moment before conversion" distribution, but serving traffic is heavily skewed toward the "browsing aimlessly" distribution. Active/dormant splitting brings training data closer to the serving mix.

Trade-offs

  • Training-set size grows by N+M× per label — not free; consider batch/replica scaling.
  • Sampling distribution for dormant windows matters. Uniform over [N+1, Y_MAX] days may over-weight ancient behavior; log-uniform weights recent dormancy more. Airbnb post doesn't specify.
  • Feature-availability alignment at serving. Whatever features are stripped for dormant training examples must be reliably detectable / strippable at serving time; otherwise there's a training-serving skew. Airbnb's "booking-only" is easy; richer stripping (e.g., "only sessions older than 30 days") is harder.
  • Doesn't address cold-start (users with zero history). For fully new users, fallback ranking or popularity priors remain necessary — this pattern handles the warm-but-stale middle of the history-depth spectrum.

Seen in

Last updated · 200 distilled / 1,178 read