Skip to content

SYSTEM Cited by 2 sources

GPT-4o

Definition

GPT-4o ("omni") is OpenAI's multi-modal flagship model announced 2024-05-13. Natively accepts text and image inputs (with audio support added later), produces text output. Positioned as a latency + cost + quality improvement over GPT-4 Turbo for general-purpose multi-modal workloads, and as the teacher model for its smaller fine-tunable sibling GPT-4o-mini.

Wiki anchor

The wiki's canonical anchor for GPT-4o is its role as the production VLM backend behind a catalog-attribute copilot, documented in the 2024-09-17 Zalando post (sources/2024-09-17-zalando-content-creation-copilot-ai-assisted-product-onboarding).

Zalando's Content Creation Copilot (systems/zalando-content-creation-copilot) launched on GPT-4 Turbo and migrated to GPT-4o during development. The swap was reported as a net improvement across three axes simultaneously: "The new model not only provided better results but also delivered faster response times and proved to be more cost-effective." Because the copilot was designed as an aggregator with stable contracts on either side, the swap did not require changes to the Content Creation Tool or Article Masterdata.

The post also names GPT-4o's empirical weakness: fine- grained fashion vocabulary. "GPT-4o model tends to suggest general attributes like 'V-necks' or 'round necks' for 'necklines' correctly, but can be less precise when it comes to more fashion-specific ones, like 'deep scoop necks'." This is characteristic of general-purpose VLMs on long-tail domain-specific vocabulary.

Tradeoffs

  • Multi-modal inputs cost more per call than text-only. See concepts/multi-modal-attribute-extraction — image tokens are more expensive than text tokens, so the multi-modal path is reserved for inputs where the signal is actually in the image.
  • Long-tail domain vocabulary underperforms. For fashion- specific terminology (specific neckline variants, niche assortment classes), accuracy drops. Zalando's response was not to fine-tune GPT-4o but to plan for complementary backends (brand data dumps, fine-tuned models, partner contributions) behind the same copilot contract.
  • Balanced vs. unbalanced eval sets give different headline numbers. Zalando explicitly notes that the fine-grained weakness is more visible on balanced eval sets than on the real (unbalanced) production distribution — a trap when comparing model quality across benchmarks.

Seen in

Last updated · 501 distilled / 1,218 read