SYSTEM Cited by 1 source

Instacart Carrot Ads pCTR model¶

Definition¶

The Carrot Ads pCTR model is the wide-and-deep predicted-click-through-rate model that scores each candidate ad in a Carrot Ads auction. It is trained with Domain Adaptive Learning, treating Instacart Marketplace as the source domain and the partner site as the target domain — so that each partner gets a performant pCTR model from day one despite having little partner-specific historical data.

(Source: sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learning)

Architecture¶

Raw inputs (user IDs, product text, context)
                │
                ▼
       Dense feature embeddings    ◄── Shared, pre-trained
                │                       on Instacart Marketplace
                │                       shopping contexts
                ▼
       Concatenate features
                │
        ┌───────┴────────┐
        ▼                ▼
    Interaction         Deep MLP tower
    layer (wide,        (deep, generalization;
    explicit feature    pre-trained dense
    crosses;            representations
    memorization)       fine-tuned on partner
                        data)
        │                │
        └───────┬────────┘
                ▼
            Final MLP
                │
                ▼
            Sigmoid
                │
                ▼
        pCTR ∈ [0, 1]

Quote on the architecture:

"This model predicts CTR by first transforming raw inputs, like user IDs and product text, into dense feature embeddings. These features are concatenated and processed through two parallel paths: an interaction layer for learning explicit feature interactions and a deep Multi-layer Perceptron (MLP) tower for learning complex, hidden patterns. The outputs are then merged and passed through a final MLP to synthesize the findings. Finally, a Sigmoid activation squashes the result into a probability score (pCTR) between 0 and 1. This architecture combines a linear 'wide' model (for memorization of specific feature interactions) with a 'deep' neural network (for generalization)."

For more architectural detail, the post links to a prior Instacart blog "One Model To Serve Them All" (not separately canonicalised here yet).

Domain adaptation in this model¶

Two layers of adaptation operate together:

Neural-network-level adaptation¶

Mechanism	Detail
Shared embedding layers	Pre-trained on Instacart Marketplace shopping contexts; embeddings capture "fundamental signals that are transferable" and are reused across partners.
Feature transfer	Wide arm consumes target-domain explicit features (e.g., historical CTR per product category); deep arm adapts pre-trained dense representations to the partner.
Selective fine-tuning	Shared layers reused without major alterations; subsequent layers fine-tuned on limited partner-specific training data.
Generalization via reuse	Avoids retraining the deep ranker from scratch for each new partner.

Training-data-level adaptation¶

Mechanism	Detail
Source data	Marketplace user-behavior data is leveraged as the source-domain corpus; embeddings are pre-trained on it.
Feature matching	Common contextual + catalog-level features between Marketplace catalog and the partner catalog are aligned (e.g., same product-category taxonomy) — see concepts/feature-taxonomy-alignment.
Per-partner feature trimming	Feature importance is analyzed in the target domain; inputs that don't contribute to prediction accuracy for that specific partner are pruned, keeping the model lightweight to fit real-time auction latency budgets. See patterns/per-partner-feature-trimming-for-auction-latency.

Why wide + deep, in this context¶

The Wide-and-Deep architectural choice is doing real work in the DAL context:

The wide arm carries explicit, partner-domain-specific feature interactions (e.g., historical CTR per product category) where memorization is desirable — exactly the features that vary the most between partners.
The deep arm carries pre-trained dense representations inherited from the source domain — exactly the representations whose generalization Instacart wants to preserve across partners.

The two arms therefore have different transferability profiles: the deep tower transfers cleanly via shared embeddings + fine-tuning; the wide arm necessarily consumes partner-specific features and is less reusable. Wide-and-Deep as an architecture is a clean fit for cross-domain transfer because it already separates memorization from generalization at the layer level.

Operating constraints¶

Real-time auction latency — pCTR scoring is on the hot path of partner ad auctions. Latency budgets are not numerically disclosed, but they motivate the per-partner feature trimming.
Per-partner feature availability — different partners surface different features at request time. The model topology must accommodate per-partner pruned feature sets without retraining the shared embeddings.
Negative-transfer guard — model alignment must be verified per partner; mistakes degrade target-domain performance. The post explicitly names "the complexity of mapping data schemas and verifying model alignment" as requiring HITL verification.

Reported outcomes¶

Lift signals (no specific numbers): higher CTR, higher total clicks per user, higher ads revenue across search ads and product category ads.

A particularly notable claim: DAL outperforms direct target-domain training even when partner data is sufficient — because Instacart's first-party Marketplace data is signal the target domain doesn't have on its own.

Caveats¶

Architectural details are textbook-level. Embedding dimensions, layer widths, parameter counts, training compute, hyperparameters, and benchmark suites are not disclosed.
The "shared embedding layers pre-trained on shopping contexts" pre-training corpus and objective are unspecified. Are these item-text embeddings? User-behavior embeddings? Both? Not stated.
Per-partner serving topology is undocumented. Whether each partner runs a distinct trimmed model variant or a single shared model with partner-aware feature routing is not specified.
No public training cadence. How often the shared embeddings are re-trained, and how partner fine-tuning composes with that cadence, are not described.

Seen in¶

sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learning — first wiki canonicalisation. Wide-and-deep pCTR backbone + two-layer domain adaptation (NN level + data level) + per- partner feature trimming for real-time auction latency.

systems/instacart-carrot-ads
companies/instacart
concepts/wide-and-deep-architecture
concepts/ctr-prediction
concepts/transfer-learning
concepts/domain-adaptive-learning
concepts/source-and-target-domain
concepts/cold-start
concepts/feature-taxonomy-alignment
concepts/negative-transfer
patterns/cross-domain-warm-start-via-shared-embeddings
patterns/per-partner-feature-trimming-for-auction-latency
systems/pinterest-feature-trimmer — adjacent system at a different altitude.
concepts/parallel-cross-and-deep-network — adjacent architecture (DCNv2 + MLP, not Wide + Deep — distinct lineage).
concepts/two-tower-architecture — adjacent architecture used at the retrieval stage, where Wide-and-Deep is typically used at ranking.