Skip to content

CONCEPT Cited by 1 source

Source and target domain

Definition

In transfer learning and domain-adaptive learning, source domain and target domain name the two settings between which knowledge is transferred:

  • Source domain — the data-rich distribution where a model has been trained (or a corpus has been observed) and from which knowledge is being borrowed.
  • Target domain — the data-scarce distribution to which the borrowed knowledge is being applied, often where labeled data is too limited to train a strong model from scratch.

Quote (Source: sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learning):

"transferring knowledge gained from solving a problem in a data-rich environment (source domain) to improve performance in a related, often data-scarce environment (target domain)."

What makes a good source/target pair

For transfer to work, the two domains must be related — there must be enough shared structure that representations learned on the source generalise to the target. The Instacart DAL example is canonical: both source (Instacart Marketplace) and target (Carrot Ads partner site) are e-commerce shopping contexts; both have user / product / interaction primitives; both predict the same task (pCTR). The differences (catalog, user base, taxonomy quirks) are at a layer above what shared embeddings encode.

If the two domains are too different, transfer doesn't help and may actively hurt — see negative transfer.

The asymmetry

Source and target are not interchangeable — they have an asymmetric relationship that reflects the data and signal imbalance:

Property Source domain Target domain
Data volume Plentiful (billions of signals) Sparse (limited interactions)
Signal depth Years of accumulated history Often newly launched
Pre-training role Provides foundational embeddings Consumes pre-trained embeddings
Fine-tuning role Frozen or lightly updated Where partner-specific layers live
Reuse cadence Refreshed rarely (cross-target amortised) Refreshed per-target

Multi-target deployments amortise source costs

A particularly powerful property of source / target framing is that a single source can serve many targets — and the cost of pre-training on the source amortises across all of them:

  • Instacart Marketplace data is the source for every Carrot Ads partner — one pre-training, many partners.
  • A foundation-model continued-pretrained on a domain corpus (e.g., eBay's e-Llama) serves all downstream e-commerce tasks.

This is the structural moat for retail-media platforms, foundation-model providers, and any organisation with first- party data that's broadly applicable.

First-party data as the structural source-domain advantage

The reason DAL outperforms from-scratch training even when the target has plentiful data is that the source domain carries signal the target structurally cannot replicate. For retail- media platforms, that signal is years of proprietary purchase and behavior data — first-party data that the target partner has no path to. The source / target asymmetry isn't temporary; it's the moat.

Risk: negative transfer when the relationship is overstated

The source / target framing presumes a usable relationship between the two domains. Mistakes:

  • Schema mismatches: source product taxonomy differs from target taxonomy in subtle ways; embeddings carry meaning the target doesn't share.
  • Convention reversals: e.g., source-domain users click-then-buy, target-domain users buy-without-clicking due to a different UX. Source-trained CTR signal is misleading.
  • Label semantics differ: a "click" in the source is a consideration signal, a "click" in the target is a purchase intent signal. Same label name, different meaning.

Mitigations:

  • Human-in-the-loop schema mapping and alignment verification (Instacart's current Carrot Ads stance).
  • Feature taxonomy alignment as an explicit upstream step.
  • Automated domain-shift detection (Instacart's planned Domain Adaptation Platform).
  • Side-by-side eval against from-scratch training on the target as a sanity check.

Generalisation across the wiki

The source / target framing shows up in many transfer-learning and adaptation contexts:

  • Pre-training corpus → fine-tuned task — the LLM case; source = web corpus, target = downstream task.
  • Source language → target language — multilingual NLP; source = high-resource language, target = low-resource language.
  • Source platform → target platform — Instacart Marketplace → partner O&O site (the Carrot Ads case).
  • Source environment → target environment — sim-to-real in robotics; source = simulator, target = physical robot.

The vocabulary is portable; the alignment work is not.

Seen in

Last updated · 542 distilled / 1,571 read