Skip to content

PATTERN Cited by 1 source

Context template prompt with special tokens

Pattern

For a generative-retrieval recsys decoder, structure the input prompt as a fixed-shape template with special tokens delimiting named segments — typically (1) an environment/context token, (2) a long-term-history segment, and (3) a real-time-intent segment. New context signals join as new segments, not as architectural changes.

Quote (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):

*"Each segment of this prompt serves a distinct role, and they are separated by special tokens.

A retailer type token tells the model which catalog and shopping context the user is shopping in.

User history SIDs from past purchases capture long-term preferences. By taking the top N previously purchased SIDs and expressing them in the same token format the model generates in, we seamlessly connect past behavior to future predictions.

Cart SIDs capture the real-time intent of the current session. While user history tells the model what someone typically likes, the cart SIDs tells it what they are building today, adapting as new items are added."*

Three segments at Instacart

[retailer_type_token]
  ↓ (special token: <RETAILER>)
[user_history_SID_1] [user_history_SID_2] ... [user_history_SID_N]
  ↓ (special token: <HISTORY_END>)
[cart_SID_1] [cart_SID_2] ... [cart_SID_M]
  ↓ (special token: <CART_END>)
→ decoder predicts next item SID via beam search
Segment Role Cardinality Time-horizon
Retailer type token Catalog scope (grocery / pet / beauty / home goods / ...) 1 token Static per-retailer
User history SIDs Long-term preferences top-N past purchases Months / years
Cart SIDs Real-time intent M items in current cart Seconds / minutes

Why three segments specifically

Each segment captures a different time-horizon of intent, and the architecture cleanly separates them:

  • Retailer type"helps us capture the distinction" between "grocery, pet, beauty, home goods, and more" — different catalog scopes have different intent priors.
  • User history"captures long-term preferences" — what the user typically likes, independent of today's task.
  • Cart contents"captures the real-time intent of the current session" — what the user is building today. "While user history tells the model what someone typically likes, the cart SIDs tells it what they are building today, adapting as new items are added."

The segments compose: retailer scopes the catalog, history scopes the user's general preferences, cart scopes the current task. The decoder learns to weight them appropriately at training time.

The architectural property — extensibility without retraining the architecture

Verbatim:

"The template structure also gives us a clean interface for future signals (such as occasion awareness, search queries, page type) without architectural changes. Each new signal is simply a new segment in the prompt."

This is the load-bearing benefit that makes the pattern more than a prompt-engineering choice: it's a structural decision that defers future architecture changes. New signals — search queries, detected shopping occasions, page type, time-of-day, weather — can be added as new segments delimited by new special tokens, with only the prompt-assembler and the model's vocabulary updated. No retraining of a different architecture, no new model surface, no new serving stack.

Why this matters for generative-retrieval specifically

In scoring retrieval, additional signals require additional features to score against. New signals require feature-engineering, schema changes, and re-deployments through the feature-serving infrastructure.

In generative retrieval over a codebook, signals are tokens in the prompt. Adding a new signal: - Add a new special token to the vocabulary. - Tokenise the new signal into the existing codeword space (or extend the vocabulary with new tokens for it). - Train on the extended template.

The same model, same serving stack, same beam search — just a slightly different prompt shape.

Sibling patterns elsewhere on the wiki

  • patterns/context-encoded-prompt-handoff (Deutsche Börse Zeppelin migration, 2026-05-19) — same shape applied to code-migration prompts: structured prompt with named segments passed between LLM stages.
  • concepts/context-encoded-llm-prompt — sibling concept at the prompt-engineering altitude.
  • patterns/specialized-workflow-router-with-llm-intent-detection (Yelp CS chatbot, 2026-05-27) — sibling pattern at the request-routing altitude (different special tokens trigger different downstream routes).
  • concepts/real-time-context-feature (Pinterest contextual sequential CG, 2026-05-08) — Pinterest's subject-Pin context feature is structurally similar (real-time intent injected into a sequence model) but lives in a scoring model, not a generative one. The Instacart pattern is the generative evolution of the same intent: real-time context as a structured input, not a feature.

Caveats

  • Specific tokenisation of segments (vocabulary IDs of <RETAILER>, <HISTORY_END>, <CART_END> etc.) not disclosed by Instacart.
  • Per-segment positional-embedding scheme not disclosed (whether segments share or have separate positional encodings).
  • Per-segment dropout / robustness training (handle missing user history, missing cart, missing retailer type) not disclosed.
  • Top-N history depth and M cart-size limits not disclosed.
  • The pattern is most natural when the consuming model is Transformer-decoder-based; alternative architectures (state-space models, retrieval-augmented sequence models) may need different structuring.

Seen in

Last updated · 542 distilled / 1,571 read