PATTERN Cited by 1 source
Feature parity audit¶
Intent¶
Cross-reference the features a model sees at training time against the features actually populated at serving time, at the specific serving artifact the model reads from, to surface silent gaps that cause online-offline discrepancy. A feature a model uses offline is not a feature a model has online unless the serving artifact — ANN index, embedding cache, ranker input table, feature store — actually contains it.
Pinterest's 2026-02-27 L1 CVR retrospective (sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr) documents this audit as the diagnostic that found entire feature families (targeting-spec flags, offsite conversion visit counts, annotations, MediaSage image embeddings) missing from the L1 embedding path despite being present in training logs.
The audit recipe¶
- Enumerate the model's expected features from the offline training job's feature-insertion tables + model registry. This is the "what the model thinks it uses" view.
- Measure actual coverage at the serving artifact via feature coverage dashboards reading from the specific artifact the model consumes (ANN index, L1 embedding, not L2 feature store). This is the "what the model actually gets" view.
- Diff the two. Feature families on the offline list with low coverage on the online dashboard are the gaps.
- Segment the diff. A feature can be 100% covered on one slice (e.g., non-shopping traffic) and 0% on another (e.g., shopping traffic). Global aggregates hide the gap.
- Triage by impact. Not every low-coverage feature matters; prioritize feature families the model has large learned dependence on (e.g., via feature-importance attribution).
- Fix by onboarding. Route missing feature families through the feature registry that controls the serving artifact's build pipeline (Pinterest's UFR in their case).
- Verify via dashboard recovery. Watch online coverage climb post-fix; the dashboard is the ground truth.
- Close the bug class at the tooling layer. Change the feature-registry default so that newly-onboarded features for one serving path are automatically considered for related serving paths — preventing recurrence.
Pinterest's concrete application¶
From the post:
"When we put the two side by side (offline insertion tables vs. online feature-coverage dashboards), it turned out several high-impact Pin feature families had never made it into the L1 embedding path at all, including: - Targeting spec flags (interest targeting, search-term modes, auto-targeting) - Offsite conversion visit counts (1/7/30/90 days) - Annotations and MediaSage image embeddings
These signals existed in training logs, so the model quite reasonably learned to lean on them. But at serving time, they were missing from the embeddings, which meant that for many oCPM and performance-sensitive ads, the online model was effectively running on a much thinner feature set than the one it was evaluated on offline."
Fix was a two-level change:
- Immediate: onboard the missing features into L1 embedding usage via UFR; coverage recovered on online dashboards; online loss moved in the right direction for CVR + engagement models, especially on shopping traffic.
- Systemic: change UFR's default so that "features onboarded for L2 are automatically considered for L1 embedding usage" — closing the class of bug at the tooling default.
Why the audit is necessary¶
A production ML system commonly has three places a feature can live, which can all diverge:
- Training logs (the offline ML view).
- Request-time feature store (e.g., L2 Feature Store; used by ranker stages).
- Model-specific serving artifact (ANN index, embedding cache, ranker input).
Without a parity audit, each pipeline can be "correct" by its own local measure while the joint system has silent feature gaps. The audit is the only cheap way to surface these gaps without running full A/B experiments.
When to run it¶
- Before every launch of a new model family at production scale. Pinterest's post effectively promotes feature parity audits from ad hoc investigation tool to pre-launch readiness check.
- On every new feature onboarding — verify the feature reached the serving artifact, not just the feature store.
- On any unexpected O/O discrepancy — it's a layer-2 investigation in the three-layer O/O diagnosis framework.
- On pipeline changes that touch the serving artifact's build path — a config change can silently drop a feature.
Related concepts¶
- Feature coverage dashboard — the measurement instrument the audit reads from.
- Training / serving boundary — the structural divide the audit is fighting.
- Online-offline discrepancy — the production hazard the audit diagnoses.
- ANN index / Feature store — the typical serving artifacts feature parity audits measure coverage on.
Applications beyond ads ranking¶
The pattern applies to any ML system where the model's feature access at training time and at serving time go through different pipelines:
- Recommendation systems — item-side features in an embedding index vs. training-time features.
- Search ranking — document features in the search index vs. query-time enrichment.
- Fraud detection — rule-engine features vs. model-feature-store features.
- Content moderation — classifier feature pipeline vs. training data pipeline.
Any system with separate training + serving feature pipelines is vulnerable to silent parity gaps and should run feature parity audits at launch + on a cadence.
Seen in¶
- sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr — canonical wiki instance. Pinterest diffed offline feature-insertion tables against online feature-coverage dashboards, found entire feature families missing from L1 embedding path, fixed via UFR onboarding + default change.