PATTERN Cited by 2 sources

LLM attribute extraction platform¶

Intent¶

Consolidate structured attribute extraction — turning unstructured product / document / entity data into typed fields — into a single self-serve internal platform driven by LLMs, so that onboarding a new attribute becomes a versioned configuration change, not a new labeled dataset + new ML model + new serving pipeline per attribute.

The pattern replaces two common prior approaches (SQL rules per attribute, or ML-classifier-per-attribute) with a unified LLM-based path: prompt template + few-shot examples + LLM choice + input-data SQL + confidence score → structured value.

When to use¶

Catalog / knowledge-graph / entity systems with many attributes (hundreds to thousands) each of which would traditionally need its own ML pipeline.
Attributes whose definitions evolve faster than datasets can be re-labeled.
Data whose signal spans multiple modalities (text + image + nutrition panel photos), where a single text ML model is blind to part of the signal.
Teams that want non-ML-engineers (catalog managers, category teams) to onboard or iterate on attributes without writing code.

Platform shape (four components)¶

Declarative config UI — per attribute, users set: name, type, natural-language description, prompt template, few-shot examples, input-data SQL, LLM choice, extraction algorithm. All configs are versioned. See concepts/self-serve-generative-ai.
ML extraction endpoint — materialises the prompt, runs the LLM (optionally in a cascade), runs self-verification to emit a confidence score.
Quality screening — dev mode (human-label UI + LLM-as-judge auto-eval on a small sample) + production mode (periodic random sampling for drift + low-confidence HITL triage).
Ingestion — write extracted values into the downstream catalog / data pipeline.

Why it beats per-attribute ML pipelines¶

No per-attribute labeled dataset. Zero-shot or few-shot prompts replace dataset collection — a full new attribute ships in days (PARSE reports 1 day for "organic", 3 days for "low sugar") vs. weeks for a new trained ML model.
No per-attribute model training. Prompt change + click deploy replaces re-training.
One pipeline to maintain. Serving infra, eval harness, HITL review queue — built once, reused per attribute.
Multi-modal for free. Swap in a VLM; the same platform now handles attributes whose signal is in images. See concepts/multi-modal-attribute-extraction.
Per-attribute cost/quality tuning. Simple attributes run on cheap LLMs at large cost savings (PARSE: -70% cost on "organic"). Hard attributes run on stronger LLMs (PARSE: -60% accuracy drop if forced onto the cheap LLM, so this is load-bearing).

Consequences / implementation notes¶

Versioned configs matter more than in most platforms. Prompts are the program; treating them as code-level artifacts (change history, author, rollback) is not optional.
Confidence score is a platform primitive. The self- verification score is what enables the cascade, the HITL routing, and drift detection — all downstream policies key off this scalar.
Two orthogonal HITL loops. patterns/low-confidence-to-human-review catches known uncertain outputs; [[patterns/human-in-the-loop-quality- sampling]] catches systematic drift the confidence score would miss. Ship both, not either.
LLM-as-judge in dev mode unblocks iteration. Humans are the ground truth for calibration but too slow for every prompt revision — auto-eval lets developers iterate 10× faster between human-labeling cycles.
Sibling pattern: patterns/unified-image-generation-platform applies the same architectural stance (one platform, multi-model, defaults-with-overrides, LLM-evaluator in the loop) to image generation instead of structured extraction.

Tradeoffs / anti-patterns¶

Don't adopt for one attribute. The platform pays off across many attributes; for a one-off extraction a handwritten SQL rule or fine-tuned classifier is cheaper.
Prompt sprawl. Without a prompt-template library + review discipline, each attribute becomes a bespoke prompt, eroding the platform's consolidation value. Pair with patterns/prompt-template-library.
Hidden provider lock-in via features. Using logit access (for self-verification) or tool-use formats that only one provider supports undermines model portability. Budget for a portable verification path.
Cost explodes at full-catalog scale without cascade + cache. Instacart explicitly flags cost-reduction (prompt batching, extraction cache by similarity) as the next frontier precisely because per-SKU LLM-on-every-attribute is eye-wateringly expensive at millions of SKUs × many attributes.

Seen in¶

sources/2025-08-01-instacart-scaling-catalog-attribute-extraction-with-multi-modal-llms — canonical wiki instance. PARSE is the reference platform; all four components (UI + extraction endpoint + quality screening + ingestion) named and described in the post.
sources/2024-09-17-zalando-content-creation-copilot-ai-assisted-product-onboarding — thinner, human-in-the-copywriting-loop sibling production instance at Zalando (systems/zalando-content-creation-copilot). Same fundamental shape (LLM over a schema-driven prompt, one pipeline for many attributes, multi-modal VLM backend), but less platformised than PARSE: no self-serve UI for attribute onboarding (attributes live in Article Masterdata and a category-relevance map edited by content guidelines owners), no disclosed confidence-score primitive, no disclosed LLM cascade, and HITL is a single pre-select-with-visual-disclosure copywriter-review step rather than a dual confidence- routing + random-sampling pipeline. The Zalando post adds three concepts-not-in-PARSE: concepts/opaque-attribute-code-translation-layer (catalog stores opaque IDs, LLM needs English — bidirectional translation), concepts/category-attribute-relevance-mapping (scope reduction per article type, partly for accuracy salvage), and concepts/input-image-selection-tradeoff (product-only-front > model-worn front > other). Scale disclosure: 75% accuracy, ~50k attributes/week across 25 markets.

patterns/unified-image-generation-platform — the sibling image-generation platform pattern at Instacart (PIXEL); same architectural stance in a different modality.
patterns/low-confidence-to-human-review — the HITL routing inside the quality-screening component.
patterns/human-in-the-loop-quality-sampling — the drift-detection HITL loop.
patterns/prompt-template-library — the defaults-with- overrides prompt-config layer.
concepts/multi-modal-attribute-extraction — what the platform's VLM path enables.
concepts/llm-self-verification — the confidence-score primitive.
concepts/llm-cascade — cost-routing inside the extraction endpoint.
concepts/self-serve-generative-ai — the UX posture.
concepts/model-agnostic-ml-platform — the platform stance on model choice.
systems/instacart-parse — canonical production instance.