SYSTEM Cited by 3 sources

GPT-4o¶

Definition¶

GPT-4o ("omni") is OpenAI's multi-modal flagship model announced 2024-05-13. Natively accepts text and image inputs (with audio support added later), produces text output. Positioned as a latency + cost + quality improvement over GPT-4 Turbo for general-purpose multi-modal workloads, and as the teacher model for its smaller fine-tunable sibling GPT-4o-mini.

Wiki anchor¶

The wiki's canonical anchor for GPT-4o is its role as the production VLM backend behind a catalog-attribute copilot, documented in the 2024-09-17 Zalando post (sources/2024-09-17-zalando-content-creation-copilot-ai-assisted-product-onboarding).

Zalando's Content Creation Copilot (systems/zalando-content-creation-copilot) launched on GPT-4 Turbo and migrated to GPT-4o during development. The swap was reported as a net improvement across three axes simultaneously: "The new model not only provided better results but also delivered faster response times and proved to be more cost-effective." Because the copilot was designed as an aggregator with stable contracts on either side, the swap did not require changes to the Content Creation Tool or Article Masterdata.

The post also names GPT-4o's empirical weakness: fine- grained fashion vocabulary. "GPT-4o model tends to suggest general attributes like 'V-necks' or 'round necks' for 'necklines' correctly, but can be less precise when it comes to more fashion-specific ones, like 'deep scoop necks'." This is characteristic of general-purpose VLMs on long-tail domain-specific vocabulary.

Tradeoffs¶

Multi-modal inputs cost more per call than text-only. See concepts/multi-modal-attribute-extraction — image tokens are more expensive than text tokens, so the multi-modal path is reserved for inputs where the signal is actually in the image.
Long-tail domain vocabulary underperforms. For fashion- specific terminology (specific neckline variants, niche assortment classes), accuracy drops. Zalando's response was not to fine-tune GPT-4o but to plan for complementary backends (brand data dumps, fine-tuned models, partner contributions) behind the same copilot contract.
Balanced vs. unbalanced eval sets give different headline numbers. Zalando explicitly notes that the fine-grained weakness is more visible on balanced eval sets than on the real (unbalanced) production distribution — a trap when comparing model quality across benchmarks.

Seen in¶

sources/2024-09-17-zalando-content-creation-copilot-ai-assisted-product-onboarding — canonical wiki instance; backend for Zalando's Content Creation Copilot after migration from GPT-4 Turbo. Empirically lower cost and latency than Turbo at equivalent-or-better accuracy on catalog-attribute extraction.
sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries — bulk-code-migration instance. GPT-4o is the transformation backend for Zalando's Component Migration Toolkit (September 2024 onward). Used at temperature=0 for reproducibility, with a static/dynamic prompt partition for cache-hit maximisation. Reported ~90% accuracy on UI-library migration across 15 B2B applications; ~$40 per repository under 2024 pricing; 30–200s per file. Two disclosed failure modes: 4K output-token limit (recovered via "continue" prompt) and "moody" residual run-to-run variance (temperature=0 reduces but does not eliminate).
sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge — LLM-as-judge instance. GPT-4o is the production judge model in Zalando's Search Quality Framework during the 2025 Luxembourg / Portugal / Greece pre-market-launch validation. Multi-modal input: product data + product images as evaluation context; graded 0–4 relevance rubric; "the reasoning is generalised and does not require specific prompts to instruct the LLM to look for specific attributes or specific parts of images." Cost: "The cost per one full run nets around 250 USD, which mainly comes from GPT-4o completion API cost" — ~1,500 segments × 25 results × 3 markets per run; 3–5 hours. Canonical wiki instance of concepts/visual-text-relevance-judgment and patterns/llm-as-judge-for-search-quality. A judgment use case, contrasting with both the 2024-09 catalog copilot (attribute extraction) and the 2025-02 code-migration toolkit (code transformation).

systems/gpt-4 — predecessor (and Turbo variant that preceded GPT-4o at Zalando)
systems/gpt-4o-mini — smaller, fine-tunable sibling
systems/zalando-search-quality-framework — 2026 LLM-as-judge deployment
concepts/multi-modal-attribute-extraction — the concept the Zalando use case instantiates
concepts/visual-text-relevance-judgment — the judge shape Zalando's search-quality framework uses
concepts/llm-cascade — the cost-routing pattern GPT-4o often sits at the top or middle of
patterns/model-agnostic-suggestion-aggregator — the pattern that let Zalando swap GPT-4 Turbo → GPT-4o with no downstream contract changes
patterns/llm-as-judge-for-search-quality — the search- QA pattern GPT-4o is the canonical judge for

GPT-4o¶

Definition¶

Wiki anchor¶

Tradeoffs¶

Seen in¶

Related¶