Skip to content

PATTERN Cited by 1 source

LLM attribute extraction platform

Intent

Consolidate structured attribute extraction — turning unstructured product / document / entity data into typed fields — into a single self-serve internal platform driven by LLMs, so that onboarding a new attribute becomes a versioned configuration change, not a new labeled dataset + new ML model + new serving pipeline per attribute.

The pattern replaces two common prior approaches (SQL rules per attribute, or ML-classifier-per-attribute) with a unified LLM-based path: prompt template + few-shot examples + LLM choice + input-data SQL + confidence score → structured value.

When to use

  • Catalog / knowledge-graph / entity systems with many attributes (hundreds to thousands) each of which would traditionally need its own ML pipeline.
  • Attributes whose definitions evolve faster than datasets can be re-labeled.
  • Data whose signal spans multiple modalities (text + image + nutrition panel photos), where a single text ML model is blind to part of the signal.
  • Teams that want non-ML-engineers (catalog managers, category teams) to onboard or iterate on attributes without writing code.

Platform shape (four components)

  1. Declarative config UI — per attribute, users set: name, type, natural-language description, prompt template, few-shot examples, input-data SQL, LLM choice, extraction algorithm. All configs are versioned. See concepts/self-serve-generative-ai.
  2. ML extraction endpoint — materialises the prompt, runs the LLM (optionally in a cascade), runs self-verification to emit a confidence score.
  3. Quality screening — dev mode (human-label UI + LLM-as-judge auto-eval on a small sample) + production mode (periodic random sampling for drift + low-confidence HITL triage).
  4. Ingestion — write extracted values into the downstream catalog / data pipeline.

Why it beats per-attribute ML pipelines

  • No per-attribute labeled dataset. Zero-shot or few-shot prompts replace dataset collection — a full new attribute ships in days (PARSE reports 1 day for "organic", 3 days for "low sugar") vs. weeks for a new trained ML model.
  • No per-attribute model training. Prompt change + click deploy replaces re-training.
  • One pipeline to maintain. Serving infra, eval harness, HITL review queue — built once, reused per attribute.
  • Multi-modal for free. Swap in a VLM; the same platform now handles attributes whose signal is in images. See concepts/multi-modal-attribute-extraction.
  • Per-attribute cost/quality tuning. Simple attributes run on cheap LLMs at large cost savings (PARSE: -70% cost on "organic"). Hard attributes run on stronger LLMs (PARSE: -60% accuracy drop if forced onto the cheap LLM, so this is load-bearing).

Consequences / implementation notes

  • Versioned configs matter more than in most platforms. Prompts are the program; treating them as code-level artifacts (change history, author, rollback) is not optional.
  • Confidence score is a platform primitive. The self- verification score is what enables the cascade, the HITL routing, and drift detection — all downstream policies key off this scalar.
  • Two orthogonal HITL loops. patterns/low-confidence-to-human-review catches known uncertain outputs; [[patterns/human-in-the-loop-quality- sampling]] catches systematic drift the confidence score would miss. Ship both, not either.
  • LLM-as-judge in dev mode unblocks iteration. Humans are the ground truth for calibration but too slow for every prompt revision — auto-eval lets developers iterate 10× faster between human-labeling cycles.
  • Sibling pattern: patterns/unified-image-generation-platform applies the same architectural stance (one platform, multi-model, defaults-with-overrides, LLM-evaluator in the loop) to image generation instead of structured extraction.

Tradeoffs / anti-patterns

  • Don't adopt for one attribute. The platform pays off across many attributes; for a one-off extraction a handwritten SQL rule or fine-tuned classifier is cheaper.
  • Prompt sprawl. Without a prompt-template library + review discipline, each attribute becomes a bespoke prompt, eroding the platform's consolidation value. Pair with patterns/prompt-template-library.
  • Hidden provider lock-in via features. Using logit access (for self-verification) or tool-use formats that only one provider supports undermines model portability. Budget for a portable verification path.
  • Cost explodes at full-catalog scale without cascade + cache. Instacart explicitly flags cost-reduction (prompt batching, extraction cache by similarity) as the next frontier precisely because per-SKU LLM-on-every-attribute is eye-wateringly expensive at millions of SKUs × many attributes.

Seen in

Last updated · 319 distilled / 1,201 read