CONCEPT Cited by 1 source

LLM segmentation over traditional NER¶

Definition¶

LLM segmentation over traditional NER is the framing that segmentation tasks (assign labels to consecutive token-runs of a query) are structurally better served by a prompted large language model than by a classical Named-Entity-Recognition model, for four joint reasons:

Flexible schema — adding / removing / renaming a label is a prompt change, not a re-training.
Low internal-taxonomy leakage — the label schema can match downstream consumers, not the internal training data.
World knowledge — the LLM already knows common entities ("Chase Center is in San Francisco", "Italian parsley is a synonym for flat parsley") without domain-specific training.
Task fusion — a sufficiently powerful LLM can combine segmentation with spell-correction / canonicalisation / intent classification in one prompt.

Canonical wiki instance: Yelp¶

Canonical reference: Yelp's 2025-02-04 post (sources/2025-02-04-yelp-search-query-understanding-with-llms). Direct disclosure of the framing:

"Compared to traditional Named Entity Recognition techniques, LLMs excel at segmentation tasks and are flexible enough to allow for easy customization of the individual classes. After several iterations, we settled on six classes for query segmentation: topic, name, location, time, question, and none. This involved a number of small but important decisions: 1. Our legacy models had several subclasses all akin to 'topic,' but this would have required the LLM to understand intricate details of our internal taxonomy that are both unintuitive and subject to change. 2. We introduced a new 'question' tag for searches that want an answer beyond just 'a list of businesses.' 3. We aligned the model outputs with potential downstream applications that can benefit from a more intelligent labeling of these tags, such as implicit location rewrite, improved name intent detection, and more accurate auto- enabled filters."

The decision not to expose internal-taxonomy subclasses of topic to the LLM is the canonical example: the LLM produces whatever schema is cleanest for downstream consumers; the internal taxonomy mapping is a separate downstream step.

Task fusion as a second unlock¶

The same post fuses spell-correction with segmentation into a single prompt:

"Throughout the process we learned that spell-correction and segmentation can be done together by a sufficiently powerful model, so we added a meta tag to mark spell corrected sections and decided to combine these two tasks into a single prompt."

Traditional NLP would implement these as two cascaded pre- processing models, each with its own training data, serving stack, and maintenance cost. One LLM prompt collapses both.

Relationship to NER¶

The output shape of LLM segmentation and traditional NER is similar — both assign a label to each token-run. The engineering economics differ:

Axis	Traditional NER	LLM segmentation
Schema change	Re-train	Edit prompt
Multi-label output	Multi-task architecture	List in prompt
Task fusion	Separate models, separate training	Single prompt
Few-shot adaptation	Labeled dataset	Few-shot examples
Cost at scale	Low per call, high per model	High per call, low per model-setup
World knowledge	Explicit features	Implicit in pre-training

The trade-off: NER wins on per-call cost at scale; LLM segmentation wins on engineering velocity and coverage of head queries via power-law caching. The combined shape — LLM to label a curated dataset, distill to a smaller serving model — is offline-teacher-online-student, and it effectively gets both.

concepts/semantic-role-labeling — the e-commerce / grocery variant; same pattern applied to grocery queries (product / brand / attribute / size / quantity slots).
concepts/query-understanding — the parent task family; segmentation is one canonical sub-task.
concepts/retrieval-augmented-generation — Yelp enriches the segmentation prompt with RAG (businesses viewed for the query) to disambiguate; classical NER has no direct analogue of this.

Caveats¶

Not a universal win. For high-throughput mature NER pipelines where the schema is stable and the per-call LLM cost is prohibitive, a dedicated NER model can still be better. The win applies primarily to workloads where schema evolves + taxonomy is bespoke + the head-caching economics hold.
LLM hallucination risk. LLMs can over-segment, under- segment, or invent labels not in the schema. Few-shot examples + constrained-decoding are the canonical mitigations.
Internal taxonomy leakage is a design choice, not a default. If the LLM's output schema leaks internal subclasses, the productionisation win collapses — every taxonomy change ripples back into the prompt.

Seen in¶

sources/2025-02-04-yelp-search-query-understanding-with-llms — canonical wiki reference; Yelp's explicit rejection of legacy-subclass NER output in favour of simpler LLM-prompted segmentation.