SYSTEM Cited by 1 source

PIXEL (Instacart's Unified Image Generation Platform)¶

Definition¶

PIXEL is Instacart's internal, unified image-generation platform — an RPC service that sits in front of multiple image-generation models and exposes them behind a single API, shared prompt-template library, standardised parameter protocol, and automated VLM-based quality evaluation. PIXEL consolidates what was previously siloed per-team generative-AI experimentation into a single platform anyone at Instacart can use, regardless of technical background. (Source: sources/2025-07-17-instacart-introducing-pixel-instacarts-unified-image-generation-platform)

Key contributor: Shishir Kumar Prasad. Announced 2025-07-17 on tech.instacart.com.

Why it exists¶

Before PIXEL, Instacart teams building with generative image AI each independently:

Picked their own model from the provider landscape
Invented their own prompting strategies per model
Integrated with each provider's API separately
Defined their own evaluation criteria

"This created duplication of effort and inconsistent results. Each team faced its own steep learning curve — figuring out what prompt worked best for a food image, which model produced the most realistic outputs, and how to measure quality."

PIXEL collapses all of this into one shared service. The architectural claim is not that PIXEL has a better model — it's that centralising model access + prompt defaults + evaluation is worth more than any one model's raw quality.

Five architectural components¶

1. Unified parameter protocol ¶

"Behind the scenes lies a unified parameter protocol that standardizes working across multiple image generation models to set image style, size, and cfg_scale which determine how closely the image follows the prompt. This means teams can switch between models from various providers by changing just the model name — PIXEL handles all the parameter translation automatically."

This is the portability primitive that makes the rest of the platform work: per-application teams can defer the "which model?" question until they've validated a use case, because switching is cheap.

2. Prompt templates + few-shot library¶

"Prompt templates define characteristics about lighting, backgrounds, and the image context are injected as few shot examples for each application. Teams can follow practical guidelines to create effective prompts across different models, reducing trial and error in the process."

Templates ship defaults with edit access — not constraints. Teams inherit working baselines and can override per-project. See patterns/prompt-template-library.

3. Fine-tuned models via DreamBooth ¶

Instacart trains DreamBooth fine-tunes on top of Stable Diffusion per product category. "This technique was highly useful to generate images of products in different backgrounds based on the retailer requirements and other characteristics such as packaging and quantity. This could be used for unbranded products like produce or meat items to get custom images trained on top of photographed resources. It can also be used for advertising to display the same product across different backgrounds."

Unbranded produce + meat is the obvious use case — every item looks category-distinct, manual photography is uneconomical, and DreamBooth's "few reference images → realistic generation in new contexts" shape fits exactly. See patterns/fine-tuned-model-per-product-category.

4. Automated VLM-based quality evaluation¶

The four-step refinement loop:

LLM generates a first-pass prompt
LLM generates curated evaluation questions based on project needs
VLM scores the generated image against the questions
On fail → failed-question-text feeds back into the prompt-generator LLM → revised prompt → goto 1

Example VLM questions: "does the given image contain ?", "does the given image contain a warm neutral background?", "does the given image contain non food content?"

Quantified impact: human-judge approval rate went from 20% to 85% after the VLM-loop shipped.

See concepts/vlm-as-image-judge + concepts/iterative-prompt-refinement + patterns/vlm-evaluator-quality-gate.

5. Infra integration¶

"We built PIXEL on top of Instacart's existing service infrastructure which creates an RPC service, giving teams access to PIXEL for their workflows through an API call. We also let users store the generated images and easily access their URLs through an unique ID stored in Snowflake."

RPC service — reuses Instacart's existing service substrate
S3 — generated-image blob storage
Snowflake — image URLs addressable by unique ID

The plumbing is deliberately unremarkable. The value is that it is shared plumbing.

Disclosed production applications¶

Application	Use case	Outcome
Butcher Cuts	Different meat cuts — visual search instead of all-text	Navigation + add-to-cart time ↓ >25 %
Lifestyle Imagery	Category images for product carousels (e.g. cheese platter composed from cheese + crackers + meats + pickled-items recommendations)	Personalised-carousel cart conversion ↑ 15 %
FoodStorm	Prepared-foods / catering — retailer-facing image generation for prepared food offerings and ingredients in order-management systems	(no numeric outcome disclosed)

A non-obvious architectural observation from the launches: "An interesting outcome we realized from launching various applications was that the best performing model varied project by project." PIXEL enabled project leads to start with pre-configured optimal defaults + rapidly test other models with sample datasets before scaling production-bound. This is the core reason the unified parameter protocol earns its place — no single model would satisfy all projects, so the platform optimises for cheap experimentation rather than model lock-in.

Impact headline numbers¶

10× reduction in time-to-image for teams using PIXEL
20% → 85% human-judge approval rate on VLM-gated output
>25% reduction in Butcher Cuts navigation + add-to-cart time
15% uplift in Lifestyle Imagery personalised-carousel cart conversion

Roadmap disclosed¶

"We're actively investing in the next phase of the platform. We're integrating newer models to expand the creative range and quality of output. For teams seeking more expressive control, PIXEL will soon offer fine-tuned knobs for adjusting image composition, lighting, and background. Finally we will offer easier access control to fine tune image models and serve them through the PIXEL platform."

Three open investments:

Integrate newer image-generation models
Expose fine-tuned control knobs (composition / lighting / background) to teams
Access-control + self-serve model-fine-tune + model-serve directly through PIXEL

Caveats¶

Announcement-voice post; implementation details gestured at rather than specified.
Specific model names not disclosed (Stable Diffusion named as the DreamBooth base; other models in the PIXEL catalog unspecified).
VLM / LLM identities not disclosed.
No throughput / latency / cost / availability numbers.
VLM-judge calibration with human judgement not characterised beyond the 20 % → 85 % approval-rate headline; no cross- application breakdown, no drift monitoring disclosed.
DreamBooth dataset size / training cost / identifier scheme not disclosed.
No cross-vendor comparison with external image-generation platforms.

Seen in¶

sources/2025-07-17-instacart-introducing-pixel-instacarts-unified-image-generation-platform — canonical announcement post. All system-level details on this page derive from this source.

companies/instacart
systems/dreambooth — fine-tuning technique for product-specific models
systems/stable-diffusion — diffusion base model named in the post
systems/aws-s3 — generated-image blob storage
systems/snowflake — image-URL metadata addressable store
concepts/vlm-as-image-judge — the quality-evaluation primitive
concepts/unified-parameter-protocol — the model-portability primitive
concepts/iterative-prompt-refinement — the 4-step VLM-gated refinement loop
concepts/few-shot-prompt-template — the prompt-template + few-shot library
concepts/model-agnostic-ml-platform — the overall architectural stance
concepts/self-serve-generative-ai — anyone-at-Instacart UX
concepts/cross-model-portability — consequence of the unified parameter protocol
patterns/unified-image-generation-platform — PIXEL-shape as a reusable pattern
patterns/vlm-evaluator-quality-gate — the iterative VLM-judge refinement loop
patterns/prompt-template-library — defaults-with-editable-overrides prompt library
patterns/fine-tuned-model-per-product-category — DreamBooth fine-tunes per product class
patterns/ai-gateway-provider-abstraction — text-LLM cousin of PIXEL's provider abstraction
patterns/centralized-embedding-platform — embedding-layer cousin of PIXEL's consolidation play
concepts/llm-as-judge — text-side cousin of the VLM-as-image-judge pattern
systems/instacart-flyer-digitization-pipeline — sibling Instacart visual-ML pipeline (flyer layout extraction, 2026-02-09 post) — uses VLM as a detector rather than as a judge