Skip to content

CONCEPT Cited by 1 source

Request-oriented sequence scaling

Definition

Request-oriented sequence scaling is the storage-and-compute architecture that unlocks the use of long-form user behaviour sequences in a ranking model by (1) processing the sequence once per request and sharing results across ad candidates, and (2) replacing redundant per-training-row data replication with a centralised, high-efficiency KV store of user logs joined with training data on the fly (Source: sources/2026-03-31-meta-adaptive-ranking-model-bending-the-inference-scaling-curve).

The problem it addresses

Long user-behaviour sequences (impressions, clicks, watch events, conversions — potentially thousands of tokens per user per time window) are high-signal inputs for ranking. They are also expensive on two axes:

  • Compute — a sequence transformer running once per candidate per request is intractable at LLM scale.
  • Storage — replicating the full sequence into every training row for every ad candidate explodes training-data size and degrades ingestion.

Both costs scale with sequence length × candidate count × request count — the worst of all possible product explosions.

The two-axis resolution

Compute: process once per request

Following request-oriented computation sharing, the heavy sequence model runs once per request, producing a compressed user-sequence embedding that is broadcast to every ad candidate. The per-candidate sequence work collapses to zero.

Storage: centralised KV store + on-the-fly join

Instead of replicating user-log sequences into each training row:

# Anti-pattern: replicate
for each training_row (request, candidate):
    row.user_sequence = copy_of_user_logs_at_request_time

Meta uses a centralised KV store:

# Request-oriented storage
kv_store[user_id] = user_logs         # one copy per user
# At training time:
row.user_sequence = kv_store.lookup(row.user_id, row.request_ts)
                                      # join on the fly

The centralised KV store is the authoritative source; training data references the user+timestamp and joins against the store at training time. Storage footprint drops from O(requests × candidates × sequence_length) to O(users × sequence_length).

Why this is the "serving-and-training" sibling of request-oriented sharing

Request-oriented computation sharing is the serving-time compute optimisation. Request-oriented sequence scaling extends the same principle to:

  • Training-time data layout (centralised KV store, not replicated rows).
  • Serving-time storage access (one fetch per request, shared).

The two together define the "Request-Oriented Optimization" pillar of Meta Adaptive Ranking Model.

Relationship to feature stores

A feature store is the generic category this sits in — centralised feature storage, consistent across training and serving. The Adaptive Ranking Model post's innovation is specifically the joined-on-the-fly-with-training-data pattern for long-form sequences, not just point-in-time features.

Seen in

Caveats

  • Post does not disclose the KV store implementation (RocksDB? HDFS-backed? custom?), sequence-length ceiling, or retention window.
  • The "join on the fly" mechanism during training is named but not detailed — whether this is a training-time fetch from the KV store per batch, or a pre-join step, is not specified.
Last updated · 319 distilled / 1,201 read