Skip to content

SYSTEM Cited by 1 source

PIXEL (Instacart's Unified Image Generation Platform)

Definition

PIXEL is Instacart's internal, unified image-generation platform — an RPC service that sits in front of multiple image-generation models and exposes them behind a single API, shared prompt-template library, standardised parameter protocol, and automated VLM-based quality evaluation. PIXEL consolidates what was previously siloed per-team generative-AI experimentation into a single platform anyone at Instacart can use, regardless of technical background. (Source: sources/2025-07-17-instacart-introducing-pixel-instacarts-unified-image-generation-platform)

Key contributor: Shishir Kumar Prasad. Announced 2025-07-17 on tech.instacart.com.

Why it exists

Before PIXEL, Instacart teams building with generative image AI each independently:

  • Picked their own model from the provider landscape
  • Invented their own prompting strategies per model
  • Integrated with each provider's API separately
  • Defined their own evaluation criteria

"This created duplication of effort and inconsistent results. Each team faced its own steep learning curve — figuring out what prompt worked best for a food image, which model produced the most realistic outputs, and how to measure quality."

PIXEL collapses all of this into one shared service. The architectural claim is not that PIXEL has a better model — it's that centralising model access + prompt defaults + evaluation is worth more than any one model's raw quality.

Five architectural components

1. Unified parameter protocol

"Behind the scenes lies a unified parameter protocol that standardizes working across multiple image generation models to set image style, size, and cfg_scale which determine how closely the image follows the prompt. This means teams can switch between models from various providers by changing just the model name — PIXEL handles all the parameter translation automatically."

This is the portability primitive that makes the rest of the platform work: per-application teams can defer the "which model?" question until they've validated a use case, because switching is cheap.

2. Prompt templates + few-shot library

"Prompt templates define characteristics about lighting, backgrounds, and the image context are injected as few shot examples for each application. Teams can follow practical guidelines to create effective prompts across different models, reducing trial and error in the process."

Templates ship defaults with edit access — not constraints. Teams inherit working baselines and can override per-project. See patterns/prompt-template-library.

3. Fine-tuned models via DreamBooth

Instacart trains DreamBooth fine-tunes on top of Stable Diffusion per product category. "This technique was highly useful to generate images of products in different backgrounds based on the retailer requirements and other characteristics such as packaging and quantity. This could be used for unbranded products like produce or meat items to get custom images trained on top of photographed resources. It can also be used for advertising to display the same product across different backgrounds."

Unbranded produce + meat is the obvious use case — every item looks category-distinct, manual photography is uneconomical, and DreamBooth's "few reference images → realistic generation in new contexts" shape fits exactly. See patterns/fine-tuned-model-per-product-category.

4. Automated VLM-based quality evaluation

The four-step refinement loop:

  1. LLM generates a first-pass prompt
  2. LLM generates curated evaluation questions based on project needs
  3. VLM scores the generated image against the questions
  4. On fail → failed-question-text feeds back into the prompt-generator LLM → revised prompt → goto 1

Example VLM questions: "does the given image contain ?", "does the given image contain a warm neutral background?", "does the given image contain non food content?"

Quantified impact: human-judge approval rate went from 20% to 85% after the VLM-loop shipped.

See concepts/vlm-as-image-judge + concepts/iterative-prompt-refinement + patterns/vlm-evaluator-quality-gate.

5. Infra integration

"We built PIXEL on top of Instacart's existing service infrastructure which creates an RPC service, giving teams access to PIXEL for their workflows through an API call. We also let users store the generated images and easily access their URLs through an unique ID stored in Snowflake."

  • RPC service — reuses Instacart's existing service substrate
  • S3 — generated-image blob storage
  • Snowflake — image URLs addressable by unique ID

The plumbing is deliberately unremarkable. The value is that it is shared plumbing.

Disclosed production applications

Application Use case Outcome
Butcher Cuts Different meat cuts — visual search instead of all-text Navigation + add-to-cart time ↓ >25 %
Lifestyle Imagery Category images for product carousels (e.g. cheese platter composed from cheese + crackers + meats + pickled-items recommendations) Personalised-carousel cart conversion ↑ 15 %
FoodStorm Prepared-foods / catering — retailer-facing image generation for prepared food offerings and ingredients in order-management systems (no numeric outcome disclosed)

A non-obvious architectural observation from the launches: "An interesting outcome we realized from launching various applications was that the best performing model varied project by project." PIXEL enabled project leads to start with pre-configured optimal defaults + rapidly test other models with sample datasets before scaling production-bound. This is the core reason the unified parameter protocol earns its place — no single model would satisfy all projects, so the platform optimises for cheap experimentation rather than model lock-in.

Impact headline numbers

  • 10× reduction in time-to-image for teams using PIXEL
  • 20% → 85% human-judge approval rate on VLM-gated output
  • >25% reduction in Butcher Cuts navigation + add-to-cart time
  • 15% uplift in Lifestyle Imagery personalised-carousel cart conversion

Roadmap disclosed

"We're actively investing in the next phase of the platform. We're integrating newer models to expand the creative range and quality of output. For teams seeking more expressive control, PIXEL will soon offer fine-tuned knobs for adjusting image composition, lighting, and background. Finally we will offer easier access control to fine tune image models and serve them through the PIXEL platform."

Three open investments:

  1. Integrate newer image-generation models
  2. Expose fine-tuned control knobs (composition / lighting / background) to teams
  3. Access-control + self-serve model-fine-tune + model-serve directly through PIXEL

Caveats

  • Announcement-voice post; implementation details gestured at rather than specified.
  • Specific model names not disclosed (Stable Diffusion named as the DreamBooth base; other models in the PIXEL catalog unspecified).
  • VLM / LLM identities not disclosed.
  • No throughput / latency / cost / availability numbers.
  • VLM-judge calibration with human judgement not characterised beyond the 20 % → 85 % approval-rate headline; no cross- application breakdown, no drift monitoring disclosed.
  • DreamBooth dataset size / training cost / identifier scheme not disclosed.
  • No cross-vendor comparison with external image-generation platforms.

Seen in

Last updated · 319 distilled / 1,201 read