Skip to content

PATTERN Cited by 1 source

Multi-attribute / multi-product prompt batching

Intent

Amortise shared-context token cost across many LLM extractions by batching multiple attributes per product into a single prompt (send the product's features once, extract N attributes in one call) OR batching multiple products per attribute into a single prompt (send the attribute definition once, extract the same attribute across M products in one call).

Per-call token cost is dominated by the repeated-context portion (product description that gets re-sent for every attribute, or attribute definition that gets re-sent for every product). Batching collapses the duplicate-context tokens and pays the context cost once per batch rather than once per (product × attribute) pair.

When to use

  • High-volume LLM extraction with an expensive per-SKU or per-attribute context that is structurally the same across many calls.
  • The LLM's context window fits the combined batch without quality-damaging compression.
  • Output quality per extraction doesn't degrade when batched — empirically test this per task before rolling out.

Two batching axes

Multi-attribute-per-product

Instead of:

extract(product P, attribute A1) → LLM call
extract(product P, attribute A2) → LLM call
extract(product P, attribute A3) → LLM call
...

Send one call:

extract(product P, [A1, A2, ..., An]) → LLM call returning {A1: v1, ..., An: vn}

Product features (title, description, image, nutrition panel) are sent once, not N times. This is the default batching direction for a catalog where each SKU has many attributes.

Multi-product-per-attribute

Instead of:

extract(attribute A, product P1) → LLM call
extract(attribute A, product P2) → LLM call
...

Send one call:

extract(attribute A, [P1, P2, ..., Pm]) → LLM call returning {P1: v1, ..., Pm: vm}

Attribute definition + extraction guidelines are sent once, not M times. Useful when the attribute definition is large (complex rules, many examples) and product entries are small (just a title + one field).

Both together (tensor batching)

The two axes compose: one prompt covering [product × attribute] in a matrix. Cost approaches the sum of unique contexts rather than the product of everything. Quality risk also compounds — test carefully.

Why cost wins are large

For a catalog of M products × N attributes with cost decomposed as k_p (per-product context) + k_a (per-attribute definition) + k_o (per-output token):

  • No batching: M × N × (k_p + k_a + k_o)
  • Multi-attribute per product: M × (k_p + N × k_a + N × k_o)
  • Multi-product per attribute: N × (k_a + M × k_p + M × k_o)

When k_p and k_a dominate — common, since product descriptions and attribute rubrics are verbose — batching can cut token cost by an order of magnitude at millions-of- SKUs scale.

Tradeoffs / gotchas

  • Per-output quality can drop. Batched prompts have more distracting context; the LLM may confuse attribute A with attribute B on the same product, or conflate two products' features. Task-specific A/B testing is required. Some attributes batch cleanly, others don't.
  • Context window ceiling. Batching more items per call eventually hits the model's context-window limit. With multi-modal inputs (each image uses many tokens), the ceiling is tighter.
  • Structured-output parsing is harder. Single-item prompts emit one value; batched prompts must emit a structured map. Parsing failures mean the whole batch is lost — add JSON-schema or function-calling constraints.
  • Per-item confidence scores harder to read. The self-verification technique of reading the first-token logit works for one output; batched outputs need per-item self-verification (more follow-up calls) or a different calibration approach.
  • Retry granularity is lumpier. If one item in a batch fails validation, you re-run the whole batch or split and re-run — either wastes prior work.
  • Latency shifts. Fewer, larger calls vs. many small calls — p95 latency per output may improve (fewer serial calls) or worsen (one large call takes longer than many parallel small ones) depending on your concurrency model.

Seen in

  • sources/2025-08-01-instacart-scaling-catalog-attribute-extraction-with-multi-modal-llms — named as future work for PARSE's cost-reduction roadmap. The post explicitly distinguishes both axes ("batch multiple attributes in a single prompt" + "batch multiple products into the same prompt") and names the motivating waste: "we can avoid sending the same product information to the LLM APIs for different attributes to save the cost" + "we can also batch multiple products into the same prompt, and ask LLM to output extraction results per product. This will help avoid sending the same attribute extraction guideline to LLM APIs for every product."
Last updated · 319 distilled / 1,201 read