Skip to content

PATTERN Cited by 1 source

Domain-specific tokenization for serving efficiency

Pattern

Replace generic text tokenizers with a domain-specific vocabulary where each token maps 1:1 to a business entity (item, category, action, row). This dramatically reduces sequence length — and thus inference latency and cost — while enabling direct token-level business logic.

Mechanism

  1. Define a vocabulary of domain tokens: entity IDs, row types, action types, time/duration buckets, profile attributes
  2. Serialize structured inputs (user history, page layout) as compact domain-token sequences
  3. Update vocabulary daily as the catalog evolves (new entities get new token IDs)
  4. Use fallback tokens for items not yet in the vocabulary

Example (Netflix GenPage)

A user watch event: - Text tokenizer (GPT-5): "User watched Orange Is the New Black for 50 minutes 30 days ago" → 16 tokens - Domain tokenizer: [Entity_ID, Action_Type, Action_Time_Bucket, Action_Duration_Bucket]4 tokens

This ~4× compression directly translates to lower inference latency and enables the model to decode full homepages within real-time serving budgets (Source: sources/2026-06-29-netflix-genpage-generative-homepage-construction).

Seen in

Last updated · 560 distilled / 1,653 read