Skip to content

SYSTEM Cited by 1 source

GPT-4o-mini

Definition

GPT-4o-mini is OpenAI's compact, cost-optimised variant of GPT-4o released 2024-07-18 and supporting fine-tuning. The smaller, cheaper sibling to GPT-4o positioned for high-volume production workloads.

Wiki anchor

The wiki's canonical anchor for GPT-4o-mini is its role as the offline-batch serving student in production LLM pipelines, canonicalised by the 2025-02-04 Yelp post (sources/2025-02-04-yelp-search-query-understanding-with-llms).

Yelp's canonical disclosure: "Fine tune a smaller model (GPT4o-mini) that we can run offline at the scale of tens of millions, and utilize this as a pre-computed cache to support that vast bulk of all traffic. Because fine-tuned query understanding models only require very short inputs and outputs, we have seen up to a 100x savings in cost, compared to using a complex GPT-4 prompt directly."

The operational datum ~100× cost reduction vs. direct GPT-4 prompt at equivalent quality on query-understanding tasks is the wiki's load-bearing number for GPT-4o-mini fine-tuning at production scale.

Production patterns

Tradeoffs

  • Fine-tuning requires a high-quality golden dataset; quality curation (isolating + re-labeling mislabeled inputs) is load- bearing.
  • Short input + short output is a pre-condition for the 100× cost-reduction figure — longer contexts narrow the gap vs. the full GPT-4 model.

Seen in

Last updated · 476 distilled / 1,218 read