Skip to content

CONCEPT Cited by 1 source

Pre-launch Market Validation

Definition

Pre-launch market validation is the problem class of assessing product quality for a market (country, locale, audience segment) that is not yet live, i.e. where no observational user signal exists. Classical click-based / CTR-based / dwell-time-based quality metrics are by definition unavailable — the whole point of pre-launch validation is to decide whether to launch based on offline evidence.

(Source: sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge.)

The shape of the problem

"Before using LLM-as-a-judge, the search quality assurance process was heavily reliant on human experts and a manual process… Not only is this process not scalable, but it is also reactive by nature, meaning that issues are only identified after features are launched and users have already experienced them, since we rely on signals coming from real users such as low CTR. For an entirely new country, these signals are by definition not there yet. We need a more proactive approach that ensures quality before launch." (Source: sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge.)

Two structural properties define the problem:

  1. No target-market behavioural data. The market hasn't launched. Click logs, CTR, abandonment, dwell time all return zero rows.
  2. Cross-market scenario reuse. Some subset of intent can be generalised from existing markets — shared brands, shared product taxonomy, shared semantic product model — modulo the language and cultural differences the new market introduces.

The combination is what makes this an LLM-as-judge-shaped problem: the judge scores a static (input, output) pair against a rubric without requiring observational behaviour data, and the rubric generalises across languages the same way the underlying product model does.

Design principles (Zalando's enumeration)

Zalando's framework was built against four principles:

  • High test coverage. Wide range of tests covering different search scenarios (product categories, brands, attributes, popular searches, seasonal or trending products).
  • Avoid handcrafted test cases. Handcrafted tests don't scale and can be biased; automate test generation while retaining the option to add or customise.
  • Multi-language support. New markets operate in different languages with different linguistic characteristics.
  • Reproducibility. Re-evaluation after fixes must be a routine operation, not a re-do-the-whole-study one.

The principles directly rule out pure manual QA (fails scalability + reproducibility + coverage) and pure click-based-bucket-test staging (fails in a no-user-data market by definition).

  • Cold-start evaluation of newly-deployed ML models. Same structural property: the model has no production behavioural signal yet. LLM-as-judge is often used here too, scoring model outputs against a rubric before gating rollout.
  • Regression detection on low-volume segments. Already-live markets but too-little traffic to distinguish signal from noise on a given segment — the same offline LLM-judge approach applies. Zalando names this as a second-order benefit of the same pipeline.

Contrast with launch-phase canary A/B testing

Canary / phased-launch traffic-shifting (e.g. via Market Groups) requires real users and a small blast radius to accept behavioural risk — it is reactive within the market, not pre-launch. Pre-launch validation completes before the first real user sees anything; canary runs after, constraining how many users see the regression before it's caught.

The two are complementary: pre-launch catches structural defects (NER vocabulary gaps, product-data quality, ranker misbehaviour on the new corpus); canary catches live-traffic pathologies (load, interaction patterns, edge cases the offline sample missed).

Seen in

Last updated · 507 distilled / 1,218 read