PATTERN Cited by 1 source
Domain-specific tokenization for serving efficiency¶
Pattern¶
Replace generic text tokenizers with a domain-specific vocabulary where each token maps 1:1 to a business entity (item, category, action, row). This dramatically reduces sequence length — and thus inference latency and cost — while enabling direct token-level business logic.
Mechanism¶
- Define a vocabulary of domain tokens: entity IDs, row types, action types, time/duration buckets, profile attributes
- Serialize structured inputs (user history, page layout) as compact domain-token sequences
- Update vocabulary daily as the catalog evolves (new entities get new token IDs)
- Use fallback tokens for items not yet in the vocabulary
Example (Netflix GenPage)¶
A user watch event:
- Text tokenizer (GPT-5): "User watched Orange Is the New Black for 50 minutes 30 days ago" → 16 tokens
- Domain tokenizer: [Entity_ID, Action_Type, Action_Time_Bucket, Action_Duration_Bucket] → 4 tokens
This ~4× compression directly translates to lower inference latency and enables the model to decode full homepages within real-time serving budgets (Source: sources/2026-06-29-netflix-genpage-generative-homepage-construction).