SYSTEM Cited by 1 source
GPT-4o-mini¶
Definition¶
GPT-4o-mini is OpenAI's compact, cost-optimised variant of GPT-4o released 2024-07-18 and supporting fine-tuning. The smaller, cheaper sibling to GPT-4o positioned for high-volume production workloads.
Wiki anchor¶
The wiki's canonical anchor for GPT-4o-mini is its role as the offline-batch serving student in production LLM pipelines, canonicalised by the 2025-02-04 Yelp post (sources/2025-02-04-yelp-search-query-understanding-with-llms).
Yelp's canonical disclosure: "Fine tune a smaller model (GPT4o-mini) that we can run offline at the scale of tens of millions, and utilize this as a pre-computed cache to support that vast bulk of all traffic. Because fine-tuned query understanding models only require very short inputs and outputs, we have seen up to a 100x savings in cost, compared to using a complex GPT-4 prompt directly."
The operational datum ~100× cost reduction vs. direct GPT-4 prompt at equivalent quality on query-understanding tasks is the wiki's load-bearing number for GPT-4o-mini fine-tuning at production scale.
Production patterns¶
- Teacher-student distillation: trained on the GPT-4- generated + human-curated golden dataset. See patterns/offline-teacher-online-student-distillation.
- Offline batch serving via OpenAI batch API: Yelp pre- computes query-understanding responses at tens-of-millions scale via batch; live traffic serves from the resulting cache. See patterns/head-cache-plus-tail-finetuned-model.
Tradeoffs¶
- Fine-tuning requires a high-quality golden dataset; quality curation (isolating + re-labeling mislabeled inputs) is load- bearing.
- Short input + short output is a pre-condition for the 100× cost-reduction figure — longer contexts narrow the gap vs. the full GPT-4 model.
Seen in¶
- sources/2025-02-04-yelp-search-query-understanding-with-llms — canonical wiki instance; offline-batch fine-tuned student for query understanding.
Related¶
- systems/gpt-4 — teacher
- concepts/llm-cascade — cost-routing pattern
- patterns/offline-teacher-online-student-distillation — the training-pipeline shape
- patterns/head-cache-plus-tail-finetuned-model — the serving-architecture shape