Skip to content

SYSTEM Cited by 1 source

Zalando Content Creation Copilot

What it is

Content Creation Copilot is Zalando's internal AI-assisted product-onboarding system, launched in 2024. It auto-generates structured product-attribute suggestions (e.g. neckline type, assortment class, colour, fit) from product photos and pre-fills them inside the Content Creation Tool for copywriter QA, collapsing the former enrich-then-QA workflow into a single QA-only step (Source: sources/2024-09-17-zalando-content-creation-copilot-ai-assisted-product-onboarding).

The architectural stance is deliberately copilot-shaped (per the IDE analogy in the post): the human remains the final decision-maker on every attribute, but the default path is accept-the-AI and the interface marks AI-suggested values with a purple dot (concepts/ai-provenance-ui-indicator).

Architecture

Four named services compose the end-to-end flow:

Service Role
Content Creation Tool Copywriter-facing UI — uploads images, receives suggestions, pre-selects them with the purple-dot marker.
Article Masterdata System-of-record for Zalando attribute codes (e.g. assortment_type_7312) and per-article-type attribute sets (which attributes are mandatory, optional, N/A).
Prompt Generator Orchestration layer — materialises the LLM prompt from Masterdata + image URLs, runs the code→English translation, filters attributes via the category-relevance map, calls the LLM, runs the English→code reverse translation, returns structured suggestions.
OpenAI GPT-4 TurboGPT-4o Backend LLM providing the attribute suggestions. Used via the OpenAI API.

The Prompt Generator is the load-bearing service. It owns three concerns the LLM cannot own:

  1. Vocabulary translationcodes ↔ English in both directions. The LLM speaks English; the catalog speaks identifier codes. The Prompt Generator is the translator on both ends of the call.
  2. Scope filteringcategory → attributes. Attributes that should not be filled for a given article type are removed from the prompt entirely, both because their suggestions were empirically inaccurate and because they'd confuse the copywriter.
  3. Image selectionproduct-only front images are selected as input preference over model-worn images or alternate angles.

Aggregator framing (future backends)

The post explicitly frames the copilot as an aggregator service, not a GPT wrapper:

"we created an aggregator service - to integrate multiple AI services, leveraging a wider variety of data sources, such as brand data dumps, partner contributions, and images, to improve the accuracy and completeness of the results."

This is the patterns/model-agnostic-suggestion-aggregator pattern: one copilot API, multiple interchangeable backends. The pattern paid off immediately — the GPT-4 Turbo → GPT-4o swap during development was a net win on latency + cost + accuracy without requiring changes to the Content Creation Tool contract.

Named future backends (not yet live as of the post):

  • Brand data dumps — attribute data supplied directly by fashion brands (authoritative for attributes like material composition).
  • Partner contributions — third-party catalog enrichment providers.
  • Additional image-derived signals — e.g. fine-tuned vision classifiers for specific attributes where a general-purpose VLM underperforms.

Operational disclosure

Metric Value
Production accuracy ~75%
Attributes enriched / week ~50,000
Markets served 25
Manual-enrichment share displaced ~25% of pipeline
Launch backend OpenAI GPT-4 Turbo
Migrated backend OpenAI GPT-4o
Best input image product-only front

Design trade-offs

  • Pre-select-with-disclosure over suggestion-on-tap. Default-accept + purple dot shifts cognitive load onto QA (which is the copywriter's existing mental altitude anyway); a hover-for-suggestion UI would keep the old write-then-QA muscle memory. (patterns/pre-select-ai-suggestions-with-visual-disclosure)
  • No disclosed confidence primitive. Unlike Instacart PARSE's self-verification confidence score, Zalando's copilot pre-selects every suggestion uniformly and relies on the human QA step as the only quality gate. No low-confidence-to-human-review routing is disclosed.
  • Scope reduction beats smarter prompting on long-tail attributes. Rather than invest in model sophistication for attributes with empirically poor accuracy on specific article types, the system removes those attributes from the prompt entirely via the category-relevance map.
  • Cost optimisation was model choice, not prompt engineering alone. The two disclosed cost wins were (1) dropping suggestions for unsupported attribute sets (scope reduction) and (2) migrating GPT-4 Turbo → GPT-4o (model choice). Prompt batching, caching, and cascade are not disclosed as active levers.

Sibling systems

  • systems/instacart-parse — Instacart's catalog attribute-extraction platform. Same problem domain (catalog-scale attribute enrichment), broader platform scope (self-serve UI, confidence scores, HITL queues, LLM cascade). Zalando's copilot is the thinner, human-in- the-copywriting-loop production sibling; PARSE is the fully-platformised self-serve one.

Seen in

Last updated · 501 distilled / 1,218 read