PATTERN Cited by 1 source
Per-partner feature trimming for auction latency¶
Pattern¶
In a multi-tenant ML serving system that runs real-time auctions under tight latency budgets — typically a retail-media or ad platform — each tenant arrives with different feature availability and different feature importance. To keep the serving model lightweight per tenant and to honor the auction- latency budget, analyze feature importance in each tenant's target domain and prune inputs that don't contribute to prediction accuracy for that specific tenant. Each tenant ends up with its own trimmed feature set; shared layers stay shared, but the per-tenant input surface is reduced.
The pattern is the data-level companion to patterns/cross-domain-warm-start-via-shared-embeddings in Domain Adaptive Learning deployments — together they form the two-layer DAL recipe.
Canonical wiki instance — Instacart Carrot Ads¶
(Source: sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learning)
Carrot Ads runs real-time ad auctions on each retailer partner's e-commerce site, scored by a wide-and-deep pCTR model trained with Domain Adaptive Learning. Quote:
"To meet real-time auction latency requirements and be flexible to various feature availability for the partners, we apply feature trimming technique to balance performance and speed. We analyze feature importance in the target domain and prune inputs that do not contribute to prediction accuracy for that specific partner, ensuring the model remains lightweight."
Two requirements drive the pattern:
- Real-time auction latency — partner ad auctions must resolve quickly; every additional feature in the model inflates feature-fetch + serialisation + scoring time.
- Variable per-partner feature availability — different partners surface different features at request time. The model must accommodate without retraining the shared embeddings.
Mechanism: feature-importance analysis per partner decides which features survive the trim; pruned features are removed from that partner's serving topology.
When to apply¶
- Multi-tenant ML serving where each tenant has different available features.
- Real-time auction or latency-bounded serving — every feature has a per-request cost, and the model needs to fit within a tight budget.
- A central / shared model with per-tenant variation — trimming gives you per-tenant variation without forking the whole training pipeline.
- A measurable feature-importance signal per tenant — SHAP, permutation importance, ablation studies, or model-internal attention weights.
When not to apply¶
- Tenants share feature availability and latency budgets — one trimmed feature set for all is simpler and avoids the per-tenant trim management overhead.
- Tenants are too data-scarce to estimate feature importance reliably per tenant — the trim itself becomes noisy and may prune useful features.
- Latency budgets aren't binding — feature trimming is optimization-by-removal; if you have headroom, the additional features may improve accuracy at low cost.
- Model architecture doesn't support input ablation — feature trimming presumes the model degrades gracefully when inputs are absent (zero-imputation, masking, learned defaults).
Steps¶
- Define the latency budget. What's the auction p99 you're targeting? What's the per-feature cost (fetch + serialise + score)?
- Train the full-feature shared model on aligned source- domain + target-domain data via the cross-domain warm-start pattern.
- Per partner, measure feature importance on partner-domain data. Methods:
- SHAP / permutation importance — gold standard but expensive.
- Ablation — drop feature, retrain head, measure delta.
- Model attention / gating signals — cheap if the architecture supports it.
- Set a pruning threshold — features below it are dropped for this partner. Typically tuned to match the latency budget.
- Trim features per partner in the serving topology:
- Option A: per-partner model variants — one trimmed model per partner; routed by partner ID.
- Option B: shared model with per-partner feature mask — one model, but the request payload is filtered per partner.
- Option C: feature-trimmer module like Pinterest's (systems/pinterest-feature-trimmer) that trims features at the root → leaf RPC fan-out boundary. Instacart's specific topology choice is not disclosed.
- Evaluate the trim — does the trimmed model match latency budget? Did accuracy drop? Re-tune threshold.
- Maintain over time — as partner traffic and feature availability change, re-measure importance and re-trim.
Adjacent system: Pinterest Feature Trimmer¶
Pinterest's Feature Trimmer is an adjacent system at a different altitude that addresses a related (but distinct) problem:
| Property | Pinterest Feature Trimmer | Instacart per-partner feature trim |
|---|---|---|
| Domain count | One (Pinterest's own ads ranking) | Many (one per Carrot Ads partner) |
| What's trimmed | Per-leaf-model: features unused by that model version | Per-partner: features unimportant for that partner's pCTR |
| Trigger | Model signature / module_info.json |
Feature-importance analysis on partner data |
| Scale | Trims root → leaf RPC payload | Trims serving model input surface |
| Goal | Eliminate network bandwidth for unused features (concepts/feature-fanout-network-bottleneck) | Hit auction latency budget per partner |
| Refresh | File-watcher on module_info.json |
Re-measured per partner over time |
The two complement each other: Pinterest's pattern would trim features that the model signature doesn't reference; Instacart's pattern goes further and trims features by importance even when the model could consume them.
Operating constraints¶
- Trim-quality observability — you need monitoring on per-partner accuracy after trimming. A bad trim shows up as a per-partner regression that average metrics can mask.
- Trim refresh cadence — feature importance changes over time; static trims become stale. Plan for re-measurement.
- Negative transfer interaction — if taxonomy alignment is off, feature-importance signals are misleading and the trim can prune features that would matter under correct alignment. Order matters: align first, then measure importance, then trim.
Caveats¶
- Instacart doesn't disclose the production topology. Are partners served by separate model variants (Option A above) or one model with feature masking (Option B)? Both are valid; the post is silent.
- No latency or trim numbers. Real-time auction latency is named as the constraint without a target p99. The trim's feature-importance threshold and per-partner residual feature counts are also undisclosed.
- The pattern presumes per-partner scale to measure importance reliably. A brand-new partner with no traffic has no feature-importance signal; trimming may need to wait for some traffic to accumulate before becoming reliable.
- Negative transfer can cascade. If taxonomy alignment is off, feature-importance signals are misleading and the trim prunes the wrong features. Align first, trim second.
Generalisation¶
The pattern generalises to any multi-tenant latency-bounded ML serving stack:
- Recsys with tenant-specific features — feature trim per tenant.
- Multi-region search — feature trim per region as availability + latency vary.
- Edge inference — feature trim per edge node based on bandwidth and feature availability.
- Serverless ML inference — feature trim per request to fit cold-start budgets.
Seen in¶
- sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learning — first wiki canonicalisation. Per-partner feature-importance analysis drives the trim; auction latency + partner feature availability are the two constraints; pattern composes with patterns/cross-domain-warm-start-via-shared-embeddings to form the two-layer Domain Adaptive Learning recipe.
Related¶
- patterns/cross-domain-warm-start-via-shared-embeddings — the neural-network-level half of the DAL recipe; this pattern is the data-level half.
- concepts/transfer-learning / concepts/domain-adaptive-learning / concepts/source-and-target-domain
- concepts/feature-taxonomy-alignment — upstream pre-condition for reliable feature-importance signals.
- concepts/ctr-prediction / concepts/wide-and-deep-architecture
- concepts/feature-fanout-network-bottleneck — the analogous problem at the network-fanout altitude.
- patterns/feature-allowlist-over-blocklist — a related stance on which features are passed through.
- systems/pinterest-feature-trimmer — adjacent system at a different altitude.
- systems/instacart-carrot-ads / systems/instacart-carrot-ads-pctr-model / companies/instacart