Skip to content

Airbnb: Recommending travel destinations to help users explore

Summary

Airbnb built a transformer-based destination recommendation model that predicts which city a user will want to travel to next. The architecture adapts language modeling: user actions (bookings / views / searches) become tokens in a sequence, each represented by summed embeddings of city + region + days-to-today, contextualized with seasonality signals. Two production applications ship on top of the model — autosuggest in the search bar and abandoned-search email notifications — with A/B tests showing booking gains, especially in regions where English is not the primary language. The paper's named contributions are three architectural patterns: (1) treating user actions as tokens in a transformer, (2) generating 14 training examples per booking to balance active-user and dormant-user prediction goals, and (3) multi-task learning with region + city prediction heads to teach the model richer geolocation representations.

Key takeaways

  • User-action-as-token framing. Each historical user action (booking / view / search) is treated like a language-model token, with the per-action embedding = sum of city + region + days-to-today embeddings. A transformer processes the sequence; the final layer predicts destination intent. Inherits the sequence-modeling vocabulary wholesale (tokens, embeddings, transformers, contextual signals) and applies it to recommendation. See concepts/user-action-as-token.
  • Short-term vs long-term interest from source signal, not architecture. Bookings carry long-term signal; views and searches carry short-term. The model uses all three sequence sources rather than separate models per horizon, letting the transformer attention layers weight signals per-user.
  • 14-examples-per-booking training-data design. For each booking B at date T, the training set generates 14 examples:
  • 7 "active user" examples at T-1, T-2, ..., T-7 using the full booking / view / search history up to that date — mimics late booking stage where the user has a rough destination idea.
  • 7 "dormant user" examples sampled randomly from T-8 ... T-365, using only booking data — mimics early planning stage where the user hasn't yet come to Airbnb. Pattern named: patterns/active-dormant-user-training-split. Addresses the fact that recommendation systems need to serve both recently-active users (where short-term signal dominates) and long-dormant users (where only long-term signal exists).
  • Multi-task learning injects geolocation hierarchy. The model has two prediction heads at the final layer: one for region (e.g. California Bay Area), one for city (e.g. San Francisco, San Jose). By jointly training both tasks and encouraging consistency between region and city predictions, the model learns that {San Francisco, San Jose, Oakland, ...} cluster under the Bay Area region — a structural prior that pure city-level training would miss. Pattern named: patterns/hierarchical-multitask-geo-prediction.
  • Two deployed applications, not one. Both served by the same underlying model: (a) Autosuggest — when a user clicks the search bar, show ranked city recommendations; A/B-tested booking gains significant in regions where English is not the primary language, and benefits include not only undecided users but also users open to booking in neighboring cheaper cities. (b) Abandoned-search email notifications — when a user abandons a search, send follow-up emails featuring listings from areas predicted by the model to re-engage them.
  • Contextual signals for seasonality. Current time (date-of-query) is included as a contextual feature so "summer" shifts predictions toward cooler destinations, etc. — distinct from the per-action days-to-today embedding.
  • User opt-out acknowledged. Post explicitly notes users can opt out of this personalization — a UX/privacy primitive worth logging but not architecturally deep here.

Caveats / what's not covered

  • No serving-infra detail. The post describes the model architecture and training-data design but not the serving stack: no latency / QPS / model-size numbers, no batching strategy, no online feature-store layout, no model-refresh cadence, no cold-start handling for new users or new cities, no fallback behavior when the model can't produce a confident prediction.
  • No A/B result magnitudes. The post says "significant booking gains in regions where English is not the primary language" but provides no numeric win rate or confidence interval.
  • No comparison to a simpler baseline. Doesn't quantify the lift of transformer-based modeling vs. simpler approaches (matrix factorization, two-tower models, or rule-based recency/popularity baselines).
  • Multi-task-learning loss function not specified. Region-vs-city "consistency" is named as a goal but the loss formulation (joint cross-entropy? auxiliary loss weighting? hierarchical softmax?) isn't given.
  • Training-example sampling distribution unclear. The dormant-user examples are "randomly sampled from 8 to 365 days before the booking" — uniform? log-uniform? unclear how dormant depth is represented in aggregate.

Architectural takeaways for the wiki

  • First Airbnb post covering ML serving in the wiki — prior Airbnb ingests have been observability, dynamic config, and privacy identity models. Extends Airbnb company coverage into the recommendation-systems domain.
  • Canonical instance of user-action-as-token sequence modeling, a pattern shared by many industry recommendation systems; wiki can anchor future Airbnb / Pinterest / LinkedIn / Netflix recommendation ingests here.
  • Canonical instance of training-data design to bridge the active-user / dormant-user gap — a recurring problem whenever a recommender must serve both recently-engaged and long-dormant populations from the same model.
  • Canonical instance of multi-task learning to inject a hierarchical taxonomy (region ⊃ city) into embedding geometry — applicable beyond travel to any recommendation domain with product categories / content taxonomies.

Source

Last updated · 200 distilled / 1,178 read