PATTERN Cited by 1 source

Domain-specific tokenization for serving efficiency¶

Pattern¶

Replace generic text tokenizers with a domain-specific vocabulary where each token maps 1:1 to a business entity (item, category, action, row). This dramatically reduces sequence length — and thus inference latency and cost — while enabling direct token-level business logic.

Mechanism¶

Define a vocabulary of domain tokens: entity IDs, row types, action types, time/duration buckets, profile attributes
Serialize structured inputs (user history, page layout) as compact domain-token sequences
Update vocabulary daily as the catalog evolves (new entities get new token IDs)
Use fallback tokens for items not yet in the vocabulary

Example (Netflix GenPage)¶

A user watch event: - Text tokenizer (GPT-5): "User watched Orange Is the New Black for 50 minutes 30 days ago" → 16 tokens - Domain tokenizer: [Entity_ID, Action_Type, Action_Time_Bucket, Action_Duration_Bucket] → 4 tokens

This ~4× compression directly translates to lower inference latency and enables the model to decode full homepages within real-time serving budgets (Source: sources/2026-06-29-netflix-genpage-generative-homepage-construction).

Seen in¶

sources/2026-06-29-netflix-genpage-generative-homepage-construction