Skip to content

PATTERN Cited by 1 source

Semantic embedding fusion for cold start

Pattern

Represent each entity as a fusion of its learned ID embedding (trained from user interactions) and a content-based embedding (derived from metadata like synopses, genres, cast). During training, randomly drop the ID embedding to force reliance on the content embedding alone. This gives new entities meaningful representations from day zero.

Mechanism

  1. Each entity has two embeddings:
  2. ID embedding: Learned from user-interaction data (plays, thumbs-up, etc.)
  3. Content embedding: Derived from semantic metadata (synopsis, genre, cast, transcripts, video content)
  4. The model's input embedding for an entity = fusion(ID_emb, content_emb)
  5. During training: with probability p, replace the entity's ID token with a generic [Entity_Fallback_Token], forcing the model to rely on the content embedding alone
  6. At serving time: new entities that lack interaction data still have a meaningful representation via their content embedding in the same latent space as established entities

Why it works

Content metadata (genres, cast, synopsis) is available at catalog ingestion time — before any user watches the title. By learning to recommend from content embeddings during training (via random ID dropout), the model transfers quality to unseen entities without any warm-up period.

(Source: sources/2026-06-29-netflix-genpage-generative-homepage-construction)

Seen in

Last updated · 560 distilled / 1,653 read