Skip to content

SYSTEM Cited by 1 source

BERT

Definition

BERT ("Bidirectional Encoder Representations from Transformers") is Google's 2018 transformer-encoder pre- training architecture. The canonical encoder-only model family for NLP classification / extraction / span-prediction tasks (as opposed to decoder-only generative models like GPT).

Wiki anchor

The wiki's canonical anchor for BERT is its role as a realtime tail-query serving model in Yelp's query- understanding cascade (2025-02-04 post — sources/2025-02-04-yelp-search-query-understanding-with-llms):

"at Yelp, we have used BERT and T5 to serve as our real time LLM model. These models are optimized for speed and efficiency, allowing us to process user queries rapidly and accurately during the complete rollout phase."

BERT serves the 5% of Yelp's production query traffic that misses both the pre-computed head cache and the offline fine- tuned GPT-4o-mini batch; a realtime tier required because never-before-seen tail queries cannot be pre-computed.

Why BERT for realtime tail

  • Small + fast relative to decoder-only LLMs — inference runs on CPU or small GPU, appropriate for realtime serving.
  • Encoder-only architecture is well-matched to classification / extraction tasks like query segmentation (which is structurally a per-token labelling task).
  • Mature tooling for fine-tuning on the teacher-generated golden dataset.

Stub

Minimal anchor. Deeper coverage (BERT-vs-variants taxonomy, pre-training objectives, model-size tradeoffs) left for future ingests.

Seen in

Last updated · 476 distilled / 1,218 read