SYSTEM Cited by 1 source

Pinterest L1 Ranking¶

Pinterest L1 Ranking is the middle stage of Pinterest's ads funnel: retrieval → L1 → L2 ranking → auction. L1 filters and prioritizes ad candidates under tight latency so downstream L2 ranking + the auction see a manageable set. It is built as a two-tower model — a query tower encodes the user / request, a Pin tower encodes the candidate ad Pin, and a dot product produces a score used to narrow + order candidates.

L1 Pin embeddings are pre-computed into an ANN (approximate nearest neighbor) index that both retrieval and L1 ranking consume. Embeddings are refreshed via hourly indexing snapshots driven by realtime enrichment; the index-build-plus-deploy cycle for large tiers "can span days".

Role in the ads funnel¶

Stage	What it does	Cost
Retrieval	Candidate generation from the ad index	Cheap; must have high recall
L1 Ranking	Narrows + orders candidates under tight latency	Moderate; two-tower dot product
L2 Ranking	Expensive feature-rich scoring over the L1 output	High; rich feature access via Feature Store
Auction	Bid / budget / pacing / eligibility → winner	Business-policy-shaped

L1's job is not to produce the final score — it's to ensure the right candidates survive into L2. Its quality budget is therefore retrieval / ranking recall at the L1 → L2 boundary, not absolute ranking quality.

Feature pipeline — the L1 embedding path¶

The Pin tower consumes features from indexing snapshots, not the L2 Feature Store. The two pipelines are separate:

L2 Feature Store feeds the L2 ranking stage — rich feature access at request time.
L1 embedding path takes a subset of features, runs them through the Pin tower, and writes the resulting embedding into the ANN index.

A feature must be explicitly onboarded into the L1 embedding path (via UFR configs) to be visible at L1 serving time, even if it's already in the Feature Store for L2. This separation is the root of the feature O/O discrepancy documented in sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr.

Known O/O hazards¶

Per the 2026-02-27 Pinterest Engineering post, L1 has two structural production hazards that can cause online-offline discrepancy:

Feature parity gap between training and the L1 embedding path. Pinterest found whole feature families (targeting-spec flags, offsite conversion visit counts at 1/7/30/90 days, annotations + MediaSage image embeddings) that existed in training logs but had never been onboarded into L1 embedding usage — so the model learned to use them offline, but at serving time they were absent.
Embedding version skew. Query and Pin towers can diverge in checkpoint because Pin embeddings are written into hourly snapshots while query models roll separately and index-deploy cycles can span days. Multiple embedding versions coexist in a single retrieval index. DHEN-family models are more sensitive to this than simpler variants.

Mitigations documented¶

patterns/feature-parity-audit — cross-reference offline feature-insertion tables against online feature-coverage dashboards; onboard missing feature families into L1 embedding via UFR; change UFR default so features onboarded for L2 are automatically considered for L1 embedding usage.
patterns/batch-embedding-for-index-consistency — for large tiers, favor batch embedding inference so each ANN build uses a single consistent embedding version.
patterns/version-skew-sensitivity-check — every new model family must pass a version-skew sensitivity sweep as part of model readiness.

Experiment methodology¶

Budget-Split A/B experiments — the online evaluation primitive for L1 changes.
Multi-source offline evaluation — loss + calibration computed across three log sources: auction-winner samples, full-request auction-candidate samples, partial-request auction-candidate samples.
pCVR-percentile breakdowns — segment offline metrics by predicted-CVR bucket to confirm gains are not concentrated in easy regions.

Caveats + unknowns¶

The 2026-02-27 post documents diagnosis methodology + two concrete failure modes, not the full architecture. Not disclosed:

The specific Pin-tower + query-tower architectures (including DHEN).
Feature count / embedding dimension / ANN-index implementation.
Index-build cadence numbers beyond "hourly" + "can span days".
L2 Feature Store internals (separate system on the wiki if published later).
Absolute latency budget for L1.
Downstream L2 + auction architecture.

Seen in¶

sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr — O/O discrepancy diagnosis on L1 CVR models; feature-parity + version-skew as the production causes; three-layer diagnosis methodology; funnel recall as ceiling; UFR default change.

Infrastructure: systems/pinterest-ufr
Architecture: concepts/two-tower-architecture, concepts/ann-index, concepts/retrieval-ranking-funnel
Hazards: concepts/online-offline-discrepancy, concepts/embedding-version-skew
Mitigations: patterns/feature-parity-audit, patterns/batch-embedding-for-index-consistency, patterns/version-skew-sensitivity-check