CONCEPT Cited by 1 source
Request-oriented sequence scaling¶
Definition¶
Request-oriented sequence scaling is the storage-and-compute architecture that unlocks the use of long-form user behaviour sequences in a ranking model by (1) processing the sequence once per request and sharing results across ad candidates, and (2) replacing redundant per-training-row data replication with a centralised, high-efficiency KV store of user logs joined with training data on the fly (Source: sources/2026-03-31-meta-adaptive-ranking-model-bending-the-inference-scaling-curve).
The problem it addresses¶
Long user-behaviour sequences (impressions, clicks, watch events, conversions — potentially thousands of tokens per user per time window) are high-signal inputs for ranking. They are also expensive on two axes:
- Compute — a sequence transformer running once per candidate per request is intractable at LLM scale.
- Storage — replicating the full sequence into every training row for every ad candidate explodes training-data size and degrades ingestion.
Both costs scale with sequence length × candidate count × request count — the worst of all possible product explosions.
The two-axis resolution¶
Compute: process once per request¶
Following request-oriented computation sharing, the heavy sequence model runs once per request, producing a compressed user-sequence embedding that is broadcast to every ad candidate. The per-candidate sequence work collapses to zero.
Storage: centralised KV store + on-the-fly join¶
Instead of replicating user-log sequences into each training row:
# Anti-pattern: replicate
for each training_row (request, candidate):
row.user_sequence = copy_of_user_logs_at_request_time
Meta uses a centralised KV store:
# Request-oriented storage
kv_store[user_id] = user_logs # one copy per user
# At training time:
row.user_sequence = kv_store.lookup(row.user_id, row.request_ts)
# join on the fly
The centralised KV store is the authoritative source; training data references the user+timestamp and joins against the store at training time. Storage footprint drops from O(requests × candidates × sequence_length) to O(users × sequence_length).
Why this is the "serving-and-training" sibling of request-oriented sharing¶
Request-oriented computation sharing is the serving-time compute optimisation. Request-oriented sequence scaling extends the same principle to:
- Training-time data layout (centralised KV store, not replicated rows).
- Serving-time storage access (one fetch per request, shared).
The two together define the "Request-Oriented Optimization" pillar of Meta Adaptive Ranking Model.
Relationship to feature stores¶
A feature store is the generic category this sits in — centralised feature storage, consistent across training and serving. The Adaptive Ranking Model post's innovation is specifically the joined-on-the-fly-with-training-data pattern for long-form sequences, not just point-in-time features.
Seen in¶
- 2026-03-31 Meta — Meta Adaptive Ranking Model — canonical wiki source; names the centralised KV store + on-the-fly training data join as the mechanism unlocking long-form user behaviour sequences at LLM scale (sources/2026-03-31-meta-adaptive-ranking-model-bending-the-inference-scaling-curve).
Caveats¶
- Post does not disclose the KV store implementation (RocksDB? HDFS-backed? custom?), sequence-length ceiling, or retention window.
- The "join on the fly" mechanism during training is named but not detailed — whether this is a training-time fetch from the KV store per batch, or a pre-join step, is not specified.