SYSTEM Cited by 1 source
Instacart Carrot Ads pCTR model¶
Definition¶
The Carrot Ads pCTR model is the wide-and-deep predicted-click-through-rate model that scores each candidate ad in a Carrot Ads auction. It is trained with Domain Adaptive Learning, treating Instacart Marketplace as the source domain and the partner site as the target domain — so that each partner gets a performant pCTR model from day one despite having little partner-specific historical data.
(Source: sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learning)
Architecture¶
Raw inputs (user IDs, product text, context)
│
▼
Dense feature embeddings ◄── Shared, pre-trained
│ on Instacart Marketplace
│ shopping contexts
▼
Concatenate features
│
┌───────┴────────┐
▼ ▼
Interaction Deep MLP tower
layer (wide, (deep, generalization;
explicit feature pre-trained dense
crosses; representations
memorization) fine-tuned on partner
data)
│ │
└───────┬────────┘
▼
Final MLP
│
▼
Sigmoid
│
▼
pCTR ∈ [0, 1]
Quote on the architecture:
"This model predicts CTR by first transforming raw inputs, like user IDs and product text, into dense feature embeddings. These features are concatenated and processed through two parallel paths: an interaction layer for learning explicit feature interactions and a deep Multi-layer Perceptron (MLP) tower for learning complex, hidden patterns. The outputs are then merged and passed through a final MLP to synthesize the findings. Finally, a Sigmoid activation squashes the result into a probability score (pCTR) between 0 and 1. This architecture combines a linear 'wide' model (for memorization of specific feature interactions) with a 'deep' neural network (for generalization)."
For more architectural detail, the post links to a prior Instacart blog "One Model To Serve Them All" (not separately canonicalised here yet).
Domain adaptation in this model¶
Two layers of adaptation operate together:
Neural-network-level adaptation¶
| Mechanism | Detail |
|---|---|
| Shared embedding layers | Pre-trained on Instacart Marketplace shopping contexts; embeddings capture "fundamental signals that are transferable" and are reused across partners. |
| Feature transfer | Wide arm consumes target-domain explicit features (e.g., historical CTR per product category); deep arm adapts pre-trained dense representations to the partner. |
| Selective fine-tuning | Shared layers reused without major alterations; subsequent layers fine-tuned on limited partner-specific training data. |
| Generalization via reuse | Avoids retraining the deep ranker from scratch for each new partner. |
Training-data-level adaptation¶
| Mechanism | Detail |
|---|---|
| Source data | Marketplace user-behavior data is leveraged as the source-domain corpus; embeddings are pre-trained on it. |
| Feature matching | Common contextual + catalog-level features between Marketplace catalog and the partner catalog are aligned (e.g., same product-category taxonomy) — see concepts/feature-taxonomy-alignment. |
| Per-partner feature trimming | Feature importance is analyzed in the target domain; inputs that don't contribute to prediction accuracy for that specific partner are pruned, keeping the model lightweight to fit real-time auction latency budgets. See patterns/per-partner-feature-trimming-for-auction-latency. |
Why wide + deep, in this context¶
The Wide-and-Deep architectural choice is doing real work in the DAL context:
- The wide arm carries explicit, partner-domain-specific feature interactions (e.g., historical CTR per product category) where memorization is desirable — exactly the features that vary the most between partners.
- The deep arm carries pre-trained dense representations inherited from the source domain — exactly the representations whose generalization Instacart wants to preserve across partners.
The two arms therefore have different transferability profiles: the deep tower transfers cleanly via shared embeddings + fine-tuning; the wide arm necessarily consumes partner-specific features and is less reusable. Wide-and-Deep as an architecture is a clean fit for cross-domain transfer because it already separates memorization from generalization at the layer level.
Operating constraints¶
- Real-time auction latency — pCTR scoring is on the hot path of partner ad auctions. Latency budgets are not numerically disclosed, but they motivate the per-partner feature trimming.
- Per-partner feature availability — different partners surface different features at request time. The model topology must accommodate per-partner pruned feature sets without retraining the shared embeddings.
- Negative-transfer guard — model alignment must be verified per partner; mistakes degrade target-domain performance. The post explicitly names "the complexity of mapping data schemas and verifying model alignment" as requiring HITL verification.
Reported outcomes¶
Lift signals (no specific numbers): higher CTR, higher total clicks per user, higher ads revenue across search ads and product category ads.
A particularly notable claim: DAL outperforms direct target-domain training even when partner data is sufficient — because Instacart's first-party Marketplace data is signal the target domain doesn't have on its own.
Caveats¶
- Architectural details are textbook-level. Embedding dimensions, layer widths, parameter counts, training compute, hyperparameters, and benchmark suites are not disclosed.
- The "shared embedding layers pre-trained on shopping contexts" pre-training corpus and objective are unspecified. Are these item-text embeddings? User-behavior embeddings? Both? Not stated.
- Per-partner serving topology is undocumented. Whether each partner runs a distinct trimmed model variant or a single shared model with partner-aware feature routing is not specified.
- No public training cadence. How often the shared embeddings are re-trained, and how partner fine-tuning composes with that cadence, are not described.
Seen in¶
- sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learning — first wiki canonicalisation. Wide-and-deep pCTR backbone + two-layer domain adaptation (NN level + data level) + per- partner feature trimming for real-time auction latency.
Related¶
- systems/instacart-carrot-ads
- companies/instacart
- concepts/wide-and-deep-architecture
- concepts/ctr-prediction
- concepts/transfer-learning
- concepts/domain-adaptive-learning
- concepts/source-and-target-domain
- concepts/cold-start
- concepts/feature-taxonomy-alignment
- concepts/negative-transfer
- patterns/cross-domain-warm-start-via-shared-embeddings
- patterns/per-partner-feature-trimming-for-auction-latency
- systems/pinterest-feature-trimmer — adjacent system at a different altitude.
- concepts/parallel-cross-and-deep-network — adjacent architecture (DCNv2 + MLP, not Wide + Deep — distinct lineage).
- concepts/two-tower-architecture — adjacent architecture used at the retrieval stage, where Wide-and-Deep is typically used at ranking.