Skip to content

INSTACART 2026-05-04 Tier 2

Read original ↗

Instacart — Empowering Carrot Ads with Domain Adaptive Learning

One-paragraph summary

Instacart's Carrot Ads is an omnichannel retail-media platform that lets retailer partners run their own ad businesses on either their owned-and-operated (O&O) e-commerce sites or on Instacart-hosted whitelabel Storefront properties, drawing demand from both retailer-sourced advertisers and Instacart's pool of 7,500+ CPG brands. Because partner ad inventory is monetized via real-time auctions ranked on predicted click-through rate (pCTR), every new partner triggers a new-partner cold-start problem: the partner site has little or no historical interaction data, so a model trained on the partner alone cannot converge, while a model trained purely on Instacart Marketplace fails to capture the partner's specific catalog and user base. Instacart solved this with Domain Adaptive Learning (DAL), a transfer-learning approach that treats Instacart Marketplace as the source domain and the partner site as the target domain, transferring knowledge at two levels — (1) neural-network level: shared shopping-context-pre-trained embedding layers, feature transfer with fine-tuning of partner-specific layers, and reuse of dense representations from the wide-and-deep pCTR backbone; and (2) training-data level: aligning catalog taxonomies between domains, leveraging large-scale Marketplace user-behavior data as source-domain signal, and applying partner-specific feature trimming to honor real-time auction latency budgets while remaining flexible to per-partner feature availability. Instacart reports that DAL outperforms both training-from-scratch and direct deployment of the Marketplace model — even when the target partner has enough data to train independently — because of the additional signal from Instacart's first-party data. The current production flow still requires human-in-the-loop verification to prevent negative transfer, and Instacart is building an automated Domain Adaptation Platform to detect domain shifts and streamline onboarding.

Key takeaways

  1. The new-partner cold-start problem is structural to retail-media platforms. Each new O&O partner site is effectively a new domain with its own catalog, taxonomy, and user-behavior distribution. Quote: "onboarding a new partner onto Carrot Ads introduces a key challenge: the 'cold start' problem, where limited historical interactions make it difficult to predict user behavior accurately." The two failure modes are explicitly named: training a partner-only model is data-hungry and under-converges; deploying the Marketplace model directly "often fails to capture the nuances of the partner's specific inventory and user base." (Source: sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learning)

  2. Domain Adaptive Learning is positioned as a subset of transfer learning. Instacart's framing: "At a high level, Domain Adaptive Learning is a subset of transfer learning. It focuses on transferring knowledge gained from solving a problem in a data-rich environment (source domain) to improve performance in a related, often data-scarce environment (target domain)." The mechanism is warm-start via reused representations, not random-init: "Instead of initializing a new model with random weights for every partner, we reuse representations and relationship signals learned from Instacart marketplace data to 'warm start' the model. This saves labeled data and computational power, but more importantly, it allows us to deploy performant models in scenarios where the target domain lacks sufficient history to converge on its own."

  3. DAL outperforms both direct-deploy and from-scratch training — even when target-domain data is sufficient. Counter-intuitive property of the approach: Instacart explicitly reports that "even when there is enough data to train a model directly on the target domain, the domain adaptive model still performs better because of the benefits from Instacart's first party data." The first-party-data advantage is the structural asymmetry that justifies DAL as a default, not just a cold-start mitigation.

  4. The pCTR backbone is a Wide-and-Deep model. Architecture: "The Domain Adaptive Learning method is based on a wide and deep Predicted Click-Through-Rate (pCTR) model architecture commonly used in large-scale recommendation systems. This model predicts CTR by first transforming raw inputs, like user IDs and product text, into dense feature embeddings. These features are concatenated and processed through two parallel paths: an interaction layer for learning explicit feature interactions and a deep Multi-layer Perceptron (MLP) tower for learning complex, hidden patterns. The outputs are then merged and passed through a final MLP to synthesize the findings. Finally, a Sigmoid activation squashes the result into a probability score (pCTR) between 0 and 1." The wide arm provides memorization of explicit feature interactions; the deep arm provides generalization to unseen feature combinations.

  5. Domain adaptation is applied at TWO distinct layers in the stack. This is the core architectural insight worth canonicalising:

  6. Neural-network-level adaptation — four mechanisms:

    • Shared embedding layers pre-trained on shopping contexts, encoding fundamental signals that transfer.
    • Feature transfer: pre-trained embeddings integrate seamlessly with partner-domain-specific input features. Wide arm consumes explicit features (e.g., historical CTR per product category) sampled from the target domain; deep arm adapts the pre-trained dense representations.
    • Selective fine-tuning: shared layers are frozen or lightly updated; subsequent layers are fine-tuned on the limited partner-specific training data.
    • Generalization — the transfer prevents needing to train the deep ranker entirely from scratch.
  7. Training-data-level adaptation — three mechanisms:

    • Source-domain selection: Instacart Marketplace interaction data is the source signal; embeddings are pre-trained on it; the foundational model is built from it.
    • Feature matching across domains so the source-domain knowledge is transferable: "Common contextual and catalog-level features between the Instacart Marketplace's catalog data and the Carrot Ads Partner's catalog are aligned (e.g. ensuring product category uses the same taxonomy)."
    • Per-partner feature trimming for latency: "To meet real-time auction latency requirements and be flexible to various feature availability for the partners, we apply feature trimming technique to balance performance and speed. We analyze feature importance in the target domain and prune inputs that do not contribute to prediction accuracy for that specific partner, ensuring the model remains lightweight."
  8. The two-layer split has different operating cadences. Neural-network adaptation is a per-model exercise tied to the training pipeline; data-level adaptation (taxonomy alignment + feature trimming) is an integration / onboarding exercise tied to each new partner. Both must succeed for DAL to work — a misaligned taxonomy invalidates the shared-embedding assumption, and an untrimmed model can violate auction latency budgets.

  9. First-party data is the structural moat. The reason DAL beats from-scratch training even when target data is plentiful is that "Instacart's first party data" contributes signal the target domain doesn't have on its own. This is the asymmetric advantage that retail-media platforms (Instacart, Amazon, Walmart Connect) have over standalone ad networks: years of proprietary purchase + behavior data that can be reused as source-domain signal across all partner deployments.

  10. Reported business outcomes (no specific numbers disclosed). "By leveraging the 'source' knowledge of the Instacart Marketplace, we achieved higher CTR, total clicks per user and ads revenue across search ads and product category ads. This approach enables us to launch high-performing ad networks for partners immediately, eliminate the traditional data ramp-up period and converge to a better stable state." The post does not disclose lift percentages, partner counts, or absolute metrics — see Caveats.

  11. Negative transfer is the explicit failure mode the platform guards against. "However, this process is not yet fully autonomous. The complexity of mapping data schemas and verifying model alignment currently requires human-in-the-loop verification to prevent negative transfer." When source and target domains differ in subtle ways (taxonomy mismatches, distribution shifts, conflicting label semantics) transferred knowledge can degrade performance below from-scratch training rather than improving it. Schema mapping and alignment verification are the gating activities.

  12. The roadmap is automation: a Domain Adaptation Platform is being built. "Looking ahead, we are building an automated Domain Adaptation Platform that can detect domain shifts and fundamentally streamline the workflow. This allows us to onboard new retail partners faster and in a more scalable way, while continuing to deliver performant ad systems from day one." The platform-thinking move turns DAL from a one-off training recipe into an institutional capability — the same arc as Instacart's PIXEL / PARSE / Maple ML platforms.

Architecture

High-level pCTR model with domain adaptation

Raw inputs (user IDs, product text, context)
       Dense feature embeddings   ◄── Shared, pre-trained on
                 │                    Instacart Marketplace
                 │                    shopping contexts
       Concatenate features
        ┌────────┴─────────┐
        ▼                  ▼
  Wide / interaction      Deep MLP tower
  layer (explicit         (hidden patterns,
  feature crosses,        generalization;
  memorization;           pre-trained dense
  partner-domain          representations
  explicit features       fine-tuned on
  e.g. historical CTR     partner data)
  per product category)
        │                  │
        └────────┬─────────┘
            Final MLP
            Sigmoid
        pCTR ∈ [0, 1]

The two adaptation layers

Layer Mechanism Operating cadence
Neural-network level Shared pre-trained embeddings; feature transfer (wide ⇄ explicit, deep ⇄ pre-trained dense); selective fine-tuning of partner-specific layers; reuse for generalization Per model retraining cycle
Training-data level Source-domain Marketplace data; taxonomy alignment between source & target catalogs; per-partner feature trimming based on target-domain feature importance Per partner onboarding (taxonomy alignment) + ongoing (feature trimming)

Operational numbers and properties (as disclosed)

Item Value Source quote / context
CPG brand count (Instacart-sourced demand) 7,500+ "flexibility to choose from retailer-sourced Ads demand, Instacart-sourced demand from 7,500+ CPG brands, or both"
Surface set (where DAL is deployed) Search ads + product category ads "higher CTR, total clicks per user and ads revenue across search ads and product category ads"
Wide-and-deep architecture Two parallel paths (interaction + deep MLP) → final MLP → Sigmoid → pCTR ∈ [0, 1] Architecture section
Domain pair Source = Instacart Marketplace; Target = partner O&O site or Storefront "By treating the Instacart Marketplace as a source domain and the partner's website as a target domain, we can transfer knowledge to bootstrap performance with a relatively smaller amount of data."
pCTR > raw-CTR baseline Disclosed as positive lift (no number) "achieved higher CTR, total clicks per user and ads revenue"
DAL > from-scratch training Yes, even with sufficient data "the domain adaptive model still performs better because of the benefits from Instacart's first party data"
Auction-latency requirement Real-time (no specific p99 disclosed) "To meet real-time auction latency requirements"
Human-in-the-loop verification Required, gating "currently requires human-in-the-loop verification to prevent negative transfer"
Domain Adaptation Platform In development, not yet shipped "Looking ahead, we are building an automated Domain Adaptation Platform"

Caveats

  1. No quantitative lift numbers. Post reports CTR / clicks / revenue "higher" without percent lift, baseline, partner count, sample size, or A/B configuration. Compare to other Instacart ML-platform posts (e.g. PARSE / PIXEL) which generally do disclose offline-eval and online-A/B numbers — this post is notably less specific. Treat "DAL works" as the team's in-house claim, not externally verified.

  2. No latency or auction-budget numbers. The post invokes "real-time auction latency requirements" as the rationale for feature trimming, but never names a target (e.g., p99 < 50 ms) or how often that budget is missed. The size of the trim, the feature-importance threshold, and the per-partner residual feature counts are also undisclosed.

  3. No partner count or onboarding-time disclosure. The text does not say how many partners run on Carrot Ads, nor how long typical onboarding takes today (with HITL) versus the target the Domain Adaptation Platform will deliver.

  4. Architecture details are at the textbook-level. The "wide and deep pCTR" description is generic — the post links to a prior Instacart blog (One Model To Serve Them All) for "more details about this architecture". There are no embedding dimensions, layer widths, parameter counts, training compute, or evaluation benchmarks specific to the DAL variant. The diagrams in the original post are referenced as "Press enter or click to view image in full size" and are not transcribable from the markdown.

  5. The "shared embedding layers pre-trained on shopping contexts" phrase is left under-specified. Are these item-text embeddings (e.g., a foundation model fine-tuned on Instacart catalog text)? User-behavior embeddings? Both? The precise pre-training objective and corpus are not disclosed.

  6. Negative-transfer mitigation is asserted, not measured. The post names the failure mode and the gating activity (HITL schema mapping + alignment verification) but provides no measurement of negative-transfer incidence, no detection metric, and no rollback story. The claim that the Domain Adaptation Platform will "detect domain shifts" is forward work, not shipped behavior.

  7. No comparison to alternatives. The post does not contrast DAL against alternatives like (a) per-partner LoRA-style adapters layered onto a frozen shared base, (b) multi-task learning across partners, (c) continued pretraining of the shared embedder per partner, or (d) meta-learning / MAML-style fast-adaptation training. "Domain adaptive learning" as named here is the umbrella; the specific recipe inside it is what would distinguish it from alternatives.

  8. Feature trimming is described but not architecturally specified. Pinterest's Feature Trimmer (systems/pinterest-feature-trimmer) is a fully architecturally-specified online-serving module with model-signature ground truth, file-watcher-driven refresh, atomic-swap consolidated maps, and skip-on-miss safety. Instacart's per-partner feature trimming is described only at the rationale level; the production-runtime topology is not disclosed. Are trimmed feature sets compiled into per-partner model variants? Routed at request time via a model-signature lookup? Both? Not stated.

Source

Last updated · 542 distilled / 1,571 read