PATTERN Cited by 1 source
LLM attribute extraction platform¶
Intent¶
Consolidate structured attribute extraction — turning unstructured product / document / entity data into typed fields — into a single self-serve internal platform driven by LLMs, so that onboarding a new attribute becomes a versioned configuration change, not a new labeled dataset + new ML model + new serving pipeline per attribute.
The pattern replaces two common prior approaches (SQL rules per attribute, or ML-classifier-per-attribute) with a unified LLM-based path: prompt template + few-shot examples + LLM choice + input-data SQL + confidence score → structured value.
When to use¶
- Catalog / knowledge-graph / entity systems with many attributes (hundreds to thousands) each of which would traditionally need its own ML pipeline.
- Attributes whose definitions evolve faster than datasets can be re-labeled.
- Data whose signal spans multiple modalities (text + image + nutrition panel photos), where a single text ML model is blind to part of the signal.
- Teams that want non-ML-engineers (catalog managers, category teams) to onboard or iterate on attributes without writing code.
Platform shape (four components)¶
- Declarative config UI — per attribute, users set: name, type, natural-language description, prompt template, few-shot examples, input-data SQL, LLM choice, extraction algorithm. All configs are versioned. See concepts/self-serve-generative-ai.
- ML extraction endpoint — materialises the prompt, runs the LLM (optionally in a cascade), runs self-verification to emit a confidence score.
- Quality screening — dev mode (human-label UI + LLM-as-judge auto-eval on a small sample) + production mode (periodic random sampling for drift + low-confidence HITL triage).
- Ingestion — write extracted values into the downstream catalog / data pipeline.
Why it beats per-attribute ML pipelines¶
- No per-attribute labeled dataset. Zero-shot or few-shot prompts replace dataset collection — a full new attribute ships in days (PARSE reports 1 day for "organic", 3 days for "low sugar") vs. weeks for a new trained ML model.
- No per-attribute model training. Prompt change + click deploy replaces re-training.
- One pipeline to maintain. Serving infra, eval harness, HITL review queue — built once, reused per attribute.
- Multi-modal for free. Swap in a VLM; the same platform now handles attributes whose signal is in images. See concepts/multi-modal-attribute-extraction.
- Per-attribute cost/quality tuning. Simple attributes run on cheap LLMs at large cost savings (PARSE: -70% cost on "organic"). Hard attributes run on stronger LLMs (PARSE: -60% accuracy drop if forced onto the cheap LLM, so this is load-bearing).
Consequences / implementation notes¶
- Versioned configs matter more than in most platforms. Prompts are the program; treating them as code-level artifacts (change history, author, rollback) is not optional.
- Confidence score is a platform primitive. The self- verification score is what enables the cascade, the HITL routing, and drift detection — all downstream policies key off this scalar.
- Two orthogonal HITL loops. patterns/low-confidence-to-human-review catches known uncertain outputs; [[patterns/human-in-the-loop-quality- sampling]] catches systematic drift the confidence score would miss. Ship both, not either.
- LLM-as-judge in dev mode unblocks iteration. Humans are the ground truth for calibration but too slow for every prompt revision — auto-eval lets developers iterate 10× faster between human-labeling cycles.
- Sibling pattern: patterns/unified-image-generation-platform applies the same architectural stance (one platform, multi-model, defaults-with-overrides, LLM-evaluator in the loop) to image generation instead of structured extraction.
Tradeoffs / anti-patterns¶
- Don't adopt for one attribute. The platform pays off across many attributes; for a one-off extraction a handwritten SQL rule or fine-tuned classifier is cheaper.
- Prompt sprawl. Without a prompt-template library + review discipline, each attribute becomes a bespoke prompt, eroding the platform's consolidation value. Pair with patterns/prompt-template-library.
- Hidden provider lock-in via features. Using logit access (for self-verification) or tool-use formats that only one provider supports undermines model portability. Budget for a portable verification path.
- Cost explodes at full-catalog scale without cascade + cache. Instacart explicitly flags cost-reduction (prompt batching, extraction cache by similarity) as the next frontier precisely because per-SKU LLM-on-every-attribute is eye-wateringly expensive at millions of SKUs × many attributes.
Seen in¶
- sources/2025-08-01-instacart-scaling-catalog-attribute-extraction-with-multi-modal-llms — canonical wiki instance. PARSE is the reference platform; all four components (UI + extraction endpoint + quality screening + ingestion) named and described in the post.
Related¶
- patterns/unified-image-generation-platform — the sibling image-generation platform pattern at Instacart (PIXEL); same architectural stance in a different modality.
- patterns/low-confidence-to-human-review — the HITL routing inside the quality-screening component.
- patterns/human-in-the-loop-quality-sampling — the drift-detection HITL loop.
- patterns/prompt-template-library — the defaults-with- overrides prompt-config layer.
- concepts/multi-modal-attribute-extraction — what the platform's VLM path enables.
- concepts/llm-self-verification — the confidence-score primitive.
- concepts/llm-cascade — cost-routing inside the extraction endpoint.
- concepts/self-serve-generative-ai — the UX posture.
- concepts/model-agnostic-ml-platform — the platform stance on model choice.
- systems/instacart-parse — canonical production instance.