Skip to content

CONCEPT Cited by 1 source

LLM segmentation over traditional NER

Definition

LLM segmentation over traditional NER is the framing that segmentation tasks (assign labels to consecutive token-runs of a query) are structurally better served by a prompted large language model than by a classical Named-Entity-Recognition model, for four joint reasons:

  1. Flexible schema — adding / removing / renaming a label is a prompt change, not a re-training.
  2. Low internal-taxonomy leakage — the label schema can match downstream consumers, not the internal training data.
  3. World knowledge — the LLM already knows common entities ("Chase Center is in San Francisco", "Italian parsley is a synonym for flat parsley") without domain-specific training.
  4. Task fusion — a sufficiently powerful LLM can combine segmentation with spell-correction / canonicalisation / intent classification in one prompt.

Canonical wiki instance: Yelp

Canonical reference: Yelp's 2025-02-04 post (sources/2025-02-04-yelp-search-query-understanding-with-llms). Direct disclosure of the framing:

"Compared to traditional Named Entity Recognition techniques, LLMs excel at segmentation tasks and are flexible enough to allow for easy customization of the individual classes. After several iterations, we settled on six classes for query segmentation: topic, name, location, time, question, and none. This involved a number of small but important decisions: 1. Our legacy models had several subclasses all akin to 'topic,' but this would have required the LLM to understand intricate details of our internal taxonomy that are both unintuitive and subject to change. 2. We introduced a new 'question' tag for searches that want an answer beyond just 'a list of businesses.' 3. We aligned the model outputs with potential downstream applications that can benefit from a more intelligent labeling of these tags, such as implicit location rewrite, improved name intent detection, and more accurate auto- enabled filters."

The decision not to expose internal-taxonomy subclasses of topic to the LLM is the canonical example: the LLM produces whatever schema is cleanest for downstream consumers; the internal taxonomy mapping is a separate downstream step.

Task fusion as a second unlock

The same post fuses spell-correction with segmentation into a single prompt:

"Throughout the process we learned that spell-correction and segmentation can be done together by a sufficiently powerful model, so we added a meta tag to mark spell corrected sections and decided to combine these two tasks into a single prompt."

Traditional NLP would implement these as two cascaded pre- processing models, each with its own training data, serving stack, and maintenance cost. One LLM prompt collapses both.

Relationship to NER

The output shape of LLM segmentation and traditional NER is similar — both assign a label to each token-run. The engineering economics differ:

Axis Traditional NER LLM segmentation
Schema change Re-train Edit prompt
Multi-label output Multi-task architecture List in prompt
Task fusion Separate models, separate training Single prompt
Few-shot adaptation Labeled dataset Few-shot examples
Cost at scale Low per call, high per model High per call, low per model-setup
World knowledge Explicit features Implicit in pre-training

The trade-off: NER wins on per-call cost at scale; LLM segmentation wins on engineering velocity and coverage of head queries via power-law caching. The combined shape — LLM to label a curated dataset, distill to a smaller serving model — is offline-teacher-online-student, and it effectively gets both.

Caveats

  • Not a universal win. For high-throughput mature NER pipelines where the schema is stable and the per-call LLM cost is prohibitive, a dedicated NER model can still be better. The win applies primarily to workloads where schema evolves + taxonomy is bespoke + the head-caching economics hold.
  • LLM hallucination risk. LLMs can over-segment, under- segment, or invent labels not in the schema. Few-shot examples + constrained-decoding are the canonical mitigations.
  • Internal taxonomy leakage is a design choice, not a default. If the LLM's output schema leaks internal subclasses, the productionisation win collapses — every taxonomy change ripples back into the prompt.

Seen in

Last updated · 476 distilled / 1,218 read