CONCEPT Cited by 1 source
NER-tag Parity Across Languages¶
Definition¶
NER-tag parity across languages is the operational realisation of the translated-query-parity invariant: run the same NER engine against both the source-language query and its target-language translation, and diff the extracted NER-tag sets. Matching tag sets = translation preserved intent and target-language NER covered it. Diverging tag sets = at least one of the two is broken.
(Source: sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge.)
The sidecar task¶
In Zalando's pipeline this is a distinct Airflow task — an "NER analyzer task in the Airflow DAG" — that runs in parallel with relevance scoring rather than as a prerequisite. Its output is a per-scenario tag-diff report, consumed as a first-class diagnostic signal alongside the LLM-judge segment-level relevance aggregates.
Separating the parity check from the relevance check is deliberate: low relevance with tag parity = ranker / product- data issue; low relevance with tag mismatch = NER-vocabulary issue.
Disclosed violation shapes¶
| Shape | Example (PT) | Effect |
|---|---|---|
| Lemmatisation drift | "desporto", "desportivo", "desportiva" all different tags | Inconsistent filters across paraphrase queries |
| Ambiguous-term collision | "tenis" / "ténis" (sneaker) vs tennis the sport | Term unrecognised; sport-shoes scenario degrades |
| Missing vocabulary | "menina", "meninas" (girl) | Mixed-gender result sets |
| Multi-word term unrecognised | "fato de treino" (tracksuit) | Zero sport/tracksuit results |
The remediation path¶
The parity signal points at which NER operation is incomplete — not directly at the fix, but at the layer:
- Lemmatisation drift → update lemmatisation rules / stemmer dictionary for the target language.
- Ambiguous-term collision → add disambiguation logic / context- aware tagging.
- Missing vocabulary → "determines whether to index missing terms for searchability" — the terms need to be added to both the NER dictionary and the searchable catalogue.
- Multi-word term unrecognised → multi-word-entity recognition needs extending for the target language.
Complementary to (not redundant with) relevance scoring¶
Both the NER-analyser and the LLM-as-judge are needed. A tag mismatch doesn't by itself prove relevance collapsed (the ranker might recover); a relevance collapse doesn't by itself prove NER is the cause (the catalogue might be missing products). The two signals together localise the defect.
Seen in¶
- sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge — canonical wiki instance; Zalando runs an NER-analyser task producing cross-language tag diffs in parallel with the LLM-judge relevance evaluation.
Related¶
- concepts/translated-query-parity — the invariant this operationalises.
- concepts/ner-clustered-query-sampling — the upstream clustering that produces the scenario identity.
- systems/zalando-ner-query-builder
- patterns/translated-query-ner-parity-check
- companies/zalando