Skip to content

SYSTEM Cited by 1 source

Instacart Carrot Ads pCTR model

Definition

The Carrot Ads pCTR model is the wide-and-deep predicted-click-through-rate model that scores each candidate ad in a Carrot Ads auction. It is trained with Domain Adaptive Learning, treating Instacart Marketplace as the source domain and the partner site as the target domain — so that each partner gets a performant pCTR model from day one despite having little partner-specific historical data.

(Source: sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learning)

Architecture

Raw inputs (user IDs, product text, context)
       Dense feature embeddings    ◄── Shared, pre-trained
                │                       on Instacart Marketplace
                │                       shopping contexts
       Concatenate features
        ┌───────┴────────┐
        ▼                ▼
    Interaction         Deep MLP tower
    layer (wide,        (deep, generalization;
    explicit feature    pre-trained dense
    crosses;            representations
    memorization)       fine-tuned on partner
                        data)
        │                │
        └───────┬────────┘
            Final MLP
            Sigmoid
        pCTR ∈ [0, 1]

Quote on the architecture:

"This model predicts CTR by first transforming raw inputs, like user IDs and product text, into dense feature embeddings. These features are concatenated and processed through two parallel paths: an interaction layer for learning explicit feature interactions and a deep Multi-layer Perceptron (MLP) tower for learning complex, hidden patterns. The outputs are then merged and passed through a final MLP to synthesize the findings. Finally, a Sigmoid activation squashes the result into a probability score (pCTR) between 0 and 1. This architecture combines a linear 'wide' model (for memorization of specific feature interactions) with a 'deep' neural network (for generalization)."

For more architectural detail, the post links to a prior Instacart blog "One Model To Serve Them All" (not separately canonicalised here yet).

Domain adaptation in this model

Two layers of adaptation operate together:

Neural-network-level adaptation

Mechanism Detail
Shared embedding layers Pre-trained on Instacart Marketplace shopping contexts; embeddings capture "fundamental signals that are transferable" and are reused across partners.
Feature transfer Wide arm consumes target-domain explicit features (e.g., historical CTR per product category); deep arm adapts pre-trained dense representations to the partner.
Selective fine-tuning Shared layers reused without major alterations; subsequent layers fine-tuned on limited partner-specific training data.
Generalization via reuse Avoids retraining the deep ranker from scratch for each new partner.

Training-data-level adaptation

Mechanism Detail
Source data Marketplace user-behavior data is leveraged as the source-domain corpus; embeddings are pre-trained on it.
Feature matching Common contextual + catalog-level features between Marketplace catalog and the partner catalog are aligned (e.g., same product-category taxonomy) — see concepts/feature-taxonomy-alignment.
Per-partner feature trimming Feature importance is analyzed in the target domain; inputs that don't contribute to prediction accuracy for that specific partner are pruned, keeping the model lightweight to fit real-time auction latency budgets. See patterns/per-partner-feature-trimming-for-auction-latency.

Why wide + deep, in this context

The Wide-and-Deep architectural choice is doing real work in the DAL context:

  • The wide arm carries explicit, partner-domain-specific feature interactions (e.g., historical CTR per product category) where memorization is desirable — exactly the features that vary the most between partners.
  • The deep arm carries pre-trained dense representations inherited from the source domain — exactly the representations whose generalization Instacart wants to preserve across partners.

The two arms therefore have different transferability profiles: the deep tower transfers cleanly via shared embeddings + fine-tuning; the wide arm necessarily consumes partner-specific features and is less reusable. Wide-and-Deep as an architecture is a clean fit for cross-domain transfer because it already separates memorization from generalization at the layer level.

Operating constraints

  • Real-time auction latency — pCTR scoring is on the hot path of partner ad auctions. Latency budgets are not numerically disclosed, but they motivate the per-partner feature trimming.
  • Per-partner feature availability — different partners surface different features at request time. The model topology must accommodate per-partner pruned feature sets without retraining the shared embeddings.
  • Negative-transfer guard — model alignment must be verified per partner; mistakes degrade target-domain performance. The post explicitly names "the complexity of mapping data schemas and verifying model alignment" as requiring HITL verification.

Reported outcomes

Lift signals (no specific numbers): higher CTR, higher total clicks per user, higher ads revenue across search ads and product category ads.

A particularly notable claim: DAL outperforms direct target-domain training even when partner data is sufficient — because Instacart's first-party Marketplace data is signal the target domain doesn't have on its own.

Caveats

  • Architectural details are textbook-level. Embedding dimensions, layer widths, parameter counts, training compute, hyperparameters, and benchmark suites are not disclosed.
  • The "shared embedding layers pre-trained on shopping contexts" pre-training corpus and objective are unspecified. Are these item-text embeddings? User-behavior embeddings? Both? Not stated.
  • Per-partner serving topology is undocumented. Whether each partner runs a distinct trimmed model variant or a single shared model with partner-aware feature routing is not specified.
  • No public training cadence. How often the shared embeddings are re-trained, and how partner fine-tuning composes with that cadence, are not described.

Seen in

Last updated · 542 distilled / 1,571 read