Instacart¶
Instacart Engineering is a Tier-2 source on the sysdesign-wiki. Instacart is a US-based grocery delivery + pick-up platform; their engineering blog covers ML-for-catalog, search + recommendations, generative-AI applications to grocery imagery, ads platform, and customer-support automation.
Key systems¶
-
systems/capsight + Caper — Capsight edge-to-cloud data flywheel (2026-02-17): Instacart's ML data platform for the Caper smart-cart fleet. Three components — Collector (on-device agent, trigger- based capture on activity signal + recognised barcode, dedicated hardware video encoder for zero AI-task regression, resilient uploader with storage-threshold pause + auto-cleanup to protect the retailer store network), Depot (cloud ingestion + indexing + searchable web UI + VLM + teacher-model pre-labelling with human correction rather than human-from-scratch annotation), and Learner (Ray-based distributed training + automated evaluation gate). Outcomes vs pre-Capsight baseline: >70% annotation cost reduction; multi-day labelling tasks → hours; model training stage 1 week → 2 days; end-to-end iteration cycle 1 month → 1 week; >5% model accuracy improvement within weeks of deployment. Canonical wiki instance of concepts/edge-cloud-data-flywheel + concepts/production-data-diversity + patterns/distributed-fleet-as-data-pipeline + patterns/trigger-based-edge-capture + patterns/vlm-assisted-pre-labeling + patterns/resilient-edge-uploader. First wiki instance of edge-fleet-as-ML-data-pipeline at Instacart (sixth platform-consolidation play after PIXEL, PARSE, Maple, Intent Engine, AI Gateway / Cost Tracker — but this one sits upstream of training rather than at serving time, and crosses the edge / cloud boundary).
-
systems/instacart-flyer-digitization-pipeline + systems/segment-anything-model-sam — Flyer digitization pipeline (2026-02-09): the internal computer- vision + LLM system converting retailer-supplied weekly grocery flyer images into tap-to-shop interactive tiles on the Instacart app. Replaces a manual bounding-box-and-match workflow (3–4 hours per flyer, "hundreds of hours each week" across retailers) with a two-phase pipeline (<30 minutes end-to-end). Phase 1 — Image Segmentation: hybrid detector choice tiered by flyer complexity — simple flyers use iterative-grid multimodal-LLM probing (~90% accuracy); complex flyers use SAM as base detector with four post-processing stages: (1) text-box removal, (2) Weighted Boxes Fusion to merge overlapping boxes (explicitly rejecting NMS as "may discard valuable information"), (3) model ensembling with classical contour detection gated per-retailer on flyer density, (4) heuristic + ML filters on aspect ratio
- size. Phase 2 — Product Identification: OCR + LLM + internal catalog search to match each box to a SKU; captured-body truncation means Phase-2 details are at component-level only. Named failure mode: FoodSAM (food- specific SAM variant) "fell short of addressing the breadth and variety of products featured in retail flyers." Canonical wiki instances of patterns/hybrid-cv-plus-llm-pipeline
- patterns/complexity-tiered-model-selection. Fourth Instacart visual-ML system on the wiki alongside PIXEL (generation), PARSE (attribute extraction), and the Caper mobile-UI migration — same "decompose the problem, match model to sub-task, route by complexity" engineering stance.
-
systems/jetpack-compose + systems/android-fragment + systems/paparazzi — Caper smart-cart Android migration (2026-02-03): Instacart's in-store scan-and-pay smart cart (stability-critical hardware — "a crash can lead to cart abandonment") Android app migrated from Fragments + XML layouts to Jetpack Compose in a four-phase plan: Phase 1 (implicit Fragment hosts via Google's
navigation-fragment-compose, manual, seeds pattern knowledge); Phase 2 (type-safe Kotlin-DSL navigation, 30+ sub-graphs / 130+ destinations, iterative AI workflow, 5–7× speed increase, 300–350 engineering hours saved); Phase 3 (Fragments → Compose screens, 100+ features, 17-step AI skill with Paparazzi visual-parity engineer-verification checkpoints, progressive-disclosure context-window discipline); Phase 4 (Compose Navigation, in progress, feature-flagged dual-system rollout). Load-bearing architectural pattern: outer-parameterless / inner-testable Composable split (MyFeatureScreen()binds DI + nav,MyFeatureScreenInternal(...)is pure Compose with callbacks) established in Phase 1 makes the Phase-4 Compose-Navigation migration cheap. First Instacart mobile-platform source on the wiki + first wiki instance of AI-skill-driven Android UI framework migration. Canonical instances of patterns/phased-framework-migration + patterns/ai-migration-skill-workflow + patterns/visual-parity-screenshot-gate; concepts/ai-assisted-refactoring-economics + concepts/ai-instructions-as-code. -
systems/instacart-intent-engine — Intent Engine (2025-11-13): Instacart's LLM-backed query-understanding system replacing a bespoke multi-model legacy stack (FastText classifier + session-mined rewrites + separate SRL). Three-lever adaptation hierarchy stated explicitly: prompting → context-engineering (RAG) → fine-tuning. Three QU sub-tasks rebuilt: (i) query category classification (retrieve top-K converted categories → LLM re-ranks with context → semantic-similarity guardrail filters); (ii) query rewrites with three specialised prompts — Substitutes / Broader / Synonyms — each with chain-of-thought + few-shot (>95% coverage at 90%+ precision, up from 50% legacy coverage); (iii) SRL via the load-bearing hybrid cache + real-time fine-tuned model pattern. SRL stack: offline RAG "teacher" pipeline (conversion history + catalog + brand-embedding similarity + frontier LLM) dual-purposed to populate a head cache AND train a Llama-3-8B + LoRA student; student is adapter-merged and served on H100 at ~300 ms (from ~700 ms out-of-box on A100). FP8 quantization gave another 10% but was not shipped due to a slight recall regression. Cache-miss fraction: ~2% of queries. Production outcomes: 6% reduction in average scroll depth on tail queries, 50% reduction in user complaints on tail-query search quality, millions of cold-start queries served weekly. Named strategic argument: "A generic LLM is a commodity; your business context is what makes your application defensible." Canonical wiki instances of patterns/head-cache-plus-tail-finetuned-model + patterns/offline-teacher-online-student-distillation. Third Instacart platform-consolidation play after PIXEL (image generation) and PARSE (attribute extraction).
- systems/maple-instacart — Maple (2025-08-27): Instacart's internal batch-LLM processing service. CSV/Parquet in, CSV/Parquet out RPC; hides the LLM provider's 50K-prompt / 200 MB / 24 h batch API behind a single interface. Stack: Python + PyArrow + orjson + Temporal for durable execution + S3 + Parquet (claimed 25× vs CSV). Proxies through the internal AI Gateway which integrates with Cost Tracker for per-team attribution. Scales to 10M+ prompt jobs; reports ~50% cost reduction vs real-time calls and "hundreds of thousands of dollars per year to just thousands of dollars per year" on specific processes. Four-class failure taxonomy (expired / rate-limited / refused / invalid-image) with per-class retry policies (patterns/infinite-retry-by-failure-class). Extends the same CSV interface to real-time-only providers via patterns/batch-then-real-time-fallback. Canonical patterns/llm-batch-processing-service.
- systems/instacart-ai-gateway — AI Gateway (2025-08-27): internal provider-abstraction + cost-tracking layer that every LLM call from Maple / PIXEL / PARSE flows through. Canonical internal-gateway instance of patterns/ai-gateway-provider-abstraction.
- systems/instacart-cost-tracker — Cost Tracker (2025-08-27): per-team LLM usage/spend accounting, integrated into AI Gateway.
- systems/instacart-pixel — PIXEL (2025-07-17): Instacart's unified internal image-generation platform. Single RPC service fronting a catalog of image-generation models; five architectural components (unified parameter protocol + few-shot prompt template library + DreamBooth fine-tunes on Stable Diffusion per product-category + VLM-based iterative quality evaluation + S3-plus- Snowflake infra). Reported outcomes: 10× team time-to-image reduction; 20% → 85% human-judge approval rate via the VLM evaluation loop; >25% reduction in Butcher Cuts add-to-cart time; 15% uplift in Lifestyle Imagery personalised-carousel cart conversion.
- systems/instacart-parse — PARSE (2025-08-01): Product
Attribute Recognition System for E-commerce. Self-serve,
multi-modal LLM platform for structured catalog-attribute
extraction. Four components: declarative + versioned Platform
UI (attribute name / type / description / prompt template /
few-shot examples / input-data SQL / LLM choice) → ML
extraction endpoint emitting extracted-value + confidence
score via entailment-prompt
self-verification → Quality Screening with dev/prod modes
(LLM-as-judge + human auditors + low-confidence HITL routing) → catalog
ingestion. Reported outcomes:
organicattribute 1 day (PARSE) vs. 1 week (traditional) at 95% accuracy; complexlow_sugariteration down to 3 days; multi-modal LLM +10% recall over text-only onsheet_count; -70% cost for cheap LLM on simple attributes / -60% accuracy for cheap LLM on hard attributes — motivating per-attribute model choice. Shares architectural DNA with PIXEL (self-serve, model-agnostic, LLM-evaluator-in-the-loop).
Key patterns / concepts¶
Flyer digitization (2026-02-09 flyer-digitization post)¶
- patterns/hybrid-cv-plus-llm-pipeline — canonical wiki instance at Instacart: Phase 1 (purpose-trained segmentation
- CV post-processing) decomposed from Phase 2 (OCR + LLM + catalog search). Localization separated from identification; each phase uses the best-in-class tool for its sub-problem.
- patterns/complexity-tiered-model-selection — canonical wiki instance at Instacart: route simple flyers to iterative-grid multimodal-LLM probing (~90% accuracy); route complex flyers to the SAM + post-processing stack. Per-retailer density gating of the contour-detection ensemble branch is a second instance of the same pattern.
- concepts/weighted-boxes-fusion — the Phase-1 box-merge technique; confidence-weighted coordinate averaging chosen over NMS because NMS discards lower-confidence information. Cited prior art: +3–10% mAP in medical-imaging ensembles.
- concepts/non-maximum-suppression — the classical alternative Instacart explicitly rejected in favour of WBF for the reasons above.
- concepts/model-ensembling-for-detection — Phase-1 ensemble of SAM-style segmentation with classical contour detection; contour branch gated per retailer based on flyer density — a dynamic, input-conditioned ensemble rather than always running every branch.
- concepts/iterative-coordinate-grid-probing — the simple- flyer detector: overlay uniform grid → ask VLM for the first box's starting cell → subdivide → recurse. Works purely via prompting + image manipulation on an off-the- shelf VLM; ~90% accuracy on simple flyers, fails on complex ones.
AI-assisted mobile UI migration (2026-02-03 Jetpack Compose post)¶
- patterns/phased-framework-migration — canonical wiki instance at Instacart Caper: four orthogonal phases (implicit Fragment hosts → type-safe nav → Fragment→Compose → Compose Navigation) each validated in production before the next; per-phase AI-involvement level calibrated to novelty + risk.
- patterns/ai-migration-skill-workflow — canonical wiki instance at Instacart Caper Phase 3: 17-step AI skill in four stages (Analysis+Baselining / Compose Implementation / Verification+Integration / Cleanup) with engineer verification checkpoints; formalised from an earlier 325+ line markdown migration guide after 5–6 prior migrations.
- patterns/visual-parity-screenshot-gate — canonical wiki instance: Paparazzi JVM-side screenshot baseline of the pre-migration Fragment informs the AI's Compose implementation; post-migration Paparazzi screenshot is diffed and engineer-reviewed; cleanup is gated on pixel-parity sign-off.
- concepts/ai-assisted-refactoring-economics — canonical wiki instance: 5–7× speed increase, 300–350 engineering hours saved on Phase 2 alone; the thesis is that "the economics of technical debt have changed" and previously-deprioritized mechanical migrations are now feasible.
- concepts/ai-instructions-as-code — canonical wiki instance: 325+ line migration guide "effectively a program that the AI executes," triple-duty (AI executes, humans checklist, reviewers verify), iterated like code over 5–6 migrations, formalised into a structured Agent Skill for progressive disclosure.
- patterns/migration-as-agent-skill — cross-vendor extension: Cloudflare/vinext (2026-02-24) is the web- framework sibling, Instacart Caper (2026-02-03) is the Android-UI-framework sibling — same architectural shape applied to a different platform's framework migration.
Query understanding / Intent Engine (2025-11-13 Intent-Engine post)¶
- patterns/head-cache-plus-tail-finetuned-model — canonical instance at Instacart: ~98% of queries served from a pre-computed head cache; ~2% (the tail) routed to a fine-tuned Llama-3-8B real-time student. The 98/2 split is the load-bearing economic number.
- patterns/offline-teacher-online-student-distillation — the training-architecture counterpart. Instacart's offline RAG pipeline is dual-purposed — its output populates the live head cache and becomes the student's supervised training set. No duplicate pipeline cost.
- patterns/teacher-student-model-compression — the more general shape; Intent Engine SRL is the LLM-serving instance complementing the prior on-device-CV instance (YouTube effects).
- concepts/query-understanding — the parent concept; QU's three sub-tasks (classification, rewrites, SRL) are all rebuilt in the post.
- concepts/semantic-role-labeling — the load-bearing QU sub-task where the hybrid-cache architecture lives.
- concepts/long-tail-query — the traffic shape forcing the hybrid architecture; Instacart ships a 50% reduction in user complaints on the bottom 2% of queries.
- concepts/context-engineering — extends the existing wiki framing (Fly.io / Dropbox / Datadog) into the retrieval-relevance axis: the post gives three concrete Instacart data streams injected into the teacher prompt (top converted brand + top converted categories + product-catalog brand embeddings) + a post-generation guardrail. "Context is the defensible moat."
- concepts/lora-low-rank-adaptation — the fine-tuning mechanism for the Llama-3-8B student.
- concepts/adapter-merging — the load-bearing latency move that got the student to 300 ms alongside an H100 upgrade.
- concepts/knowledge-distillation — the academic framing; Instacart uses response distillation (supervised fine-tuning on teacher outputs) rather than soft-label Hinton-style distillation.
- concepts/quantization — FP8 evaluated, rejected due to recall regression: canonical instance of latency-vs-quality trade-off resolved in favour of quality.
Batch LLM processing (2025-08-27 Maple post)¶
- patterns/llm-batch-processing-service — canonical instance at Instacart: one platform fronting the LLM provider's batch API with CSV/Parquet-in, CSV/Parquet-out interface; Temporal-backed durable workflow; S3-Parquet intermediate storage; per-team cost accounting via AI Gateway.
- patterns/batch-then-real-time-fallback — unified-interface extension to providers without batch APIs; auto-parallelisation
- exponential backoff behind the same CSV interface.
- patterns/infinite-retry-by-failure-class — class-specific retry policy keyed on the provider's four-class failure taxonomy (expired + rate-limited = infinite, refused = max 2×, invalid-image = optional with pre-check on retry #2).
- patterns/csv-in-parquet-intermediate-output-merge — accept CSV at boundary, Parquet internally, output format mirrors input; 25× compression wins at scale.
- concepts/llm-batch-api — the provider API surface Maple abstracts (50K / 200 MB / 24 h SLA / ~50% cost discount).
- concepts/provider-failure-taxonomy — the four-class Maple framework for typed-failure dispatching.
- concepts/stream-based-file-processing — memory-safety discipline load-bearing at 10M+ prompt scale.
- concepts/cost-tracking-per-team — the AI-Gateway-level governance primitive.
- concepts/durable-execution — Maple's Temporal-backed property; sharpens the motivation beyond crash recovery to cost protection (LLM batch APIs bill on submit, not on completion).
Structured attribute extraction (2025-08-01 PARSE post)¶
- patterns/llm-attribute-extraction-platform — canonical instance at Instacart: one platform consolidating per- attribute SQL rules + per-attribute ML models into declarative LLM-driven config.
- patterns/low-confidence-to-human-review — proactive error detection: low-confidence extractions route to human auditors before catalog ingestion.
- patterns/human-in-the-loop-quality-sampling — orthogonal drift-detection loop: periodic random sample reviewed by humans + LLM-as-judge.
- patterns/multi-attribute-multi-product-prompt-batching — future-work cost-reduction: batch attributes-per-product or products-per-attribute to amortise shared-context tokens.
- patterns/llm-extraction-cache-by-similarity — future-work cost-reduction: cache extraction results keyed by product- similarity function (blocked on duplicate-product detection).
- concepts/llm-self-verification — entailment prompt + yes-token logit → per-extraction confidence score. Cites AutoMix [2] as literature basis.
- concepts/llm-cascade — per-attribute cheap-vs-expensive LLM choice; Instacart's 70% cost reduction on simple attributes and 60% accuracy drop on hard attributes is the motivating number.
- concepts/multi-modal-attribute-extraction — cross-modal
(text + image) reasoning for attributes like
sheet_countwhose value may be image-only or require text+image cross- reference. +10% recall over text-only.
Image-generation platform (2025-07-17 PIXEL post)¶
- patterns/unified-image-generation-platform — canonical instance at Instacart: one platform fronting multiple models with unified parameter translation, prompt-template defaults, VLM quality gate, fine-tunes, and infra integration.
- patterns/vlm-evaluator-quality-gate — the four-step loop (prompt-LLM → generate → VLM-judge → failed-questions-fed-back) that raised approval rate 20% → 85%.
- patterns/prompt-template-library — per-application prompt templates with few-shot exemplars encoding lighting / background / composition defaults.
- patterns/fine-tuned-model-per-product-category — DreamBooth fine-tunes for unbranded produce + meat categories.
- concepts/unified-parameter-protocol —
style/size/cfg_scalenormalised across providers; model swap is a model-name string edit. - concepts/cross-model-portability — the consequence: "the best performing model varied project by project" so portability is load-bearing.
- concepts/model-agnostic-ml-platform — platform stance.
- concepts/self-serve-generative-ai — UI usable by anyone at Instacart regardless of technical background.
- concepts/vlm-as-image-judge — the core quality-evaluation primitive.
- concepts/iterative-prompt-refinement — the loop structure (4 steps: prompt → generate → score → feed-failed- questions-back).
- concepts/few-shot-prompt-template — the prompt-template primitive.
Recent articles¶
-
2026-02-17 — Turning Data into Velocity: Caper's Edge and Cloud Data Flywheel with Capsight → sources/2026-02-17-instacart-turning-data-into-velocity-capers-edge-and-cloud-data-flywheel-with-capsight — Instacart Engineering post (authors: Youming Luo, Andrew Tanner, Matas Sriubiskis, Sylvia Lin, Sikun Zhu, Lei Li, Xiao Zhou) introducing Capsight, the edge→cloud data flywheel for Instacart's Caper smart-cart fleet. Three-component architecture (Collector on-device → Depot in the cloud → Ray-based Learner); closed loop Collect → Manage → Label → Train → Deploy. Core problem named: Caper models trained on manually-collected data underfit production diversity (concepts/production-data-diversity: lighting, occlusion, damaged packaging, motion blur, store-specific SKUs); each cart emits "gigabytes" of multi-modal data; end-to-end iteration cycle was a month; annotation cost grew linearly with fleet size by default. Load-bearing design goal: iteration cost must not grow linearly with deployment size. Collector: trigger-based capture (activity signal + recognised barcode), dedicated hardware video encoder + dedicated weight/location protocol for zero regression on the cart's primary AI tasks (concepts/hardware-offload), resilient uploader (bandwidth-aware to not hurt retailer store networks + storage-threshold-pauses-collection + auto-cleanup-oldest-on- upload-failure). Depot: distributed ingestion/processing, metadata indexing + searchable web UI (observability of the fleet), and the cost-moving innovation — [[patterns/vlm- assisted-pre-labeling|VLM + teacher-model pre-labelling]] where empty backgrounds are auto-filtered, a VLM plus internal teacher models generate pre-labels for items + barcodes, and humans correct rather than create. Projected >70% annotation cost reduction; multi-day tasks → hours; same pipeline cleans historical ground-truth errors. Learner: "distributed, Ray-based training platform" (Ray is now canonical for Capsight training) with automated evaluation against standard test sets; drops model training stage from 1 week to 2 days. End-to-end outcomes: iteration cycle 1 month → 1 week; early models trained on Capsight-curated data show >5% accuracy improvement within weeks, with continued gains as fleet scales. Future work: full multi-modal sensor-fusion foundation model (concepts/multi-modal-attribute-extraction applied to real-world physical-environment understanding), intent detection for complex multi-item interactions, automatically-surfaced highest-value training data. Sixth Instacart source on the wiki; first one crossing the edge/cloud boundary and the first ML-data-platform source on the wiki (PIXEL / PARSE / Maple are ML-serving platforms; Capsight is an ML-training-data platform). Canonical wiki instances of [[concepts/edge-cloud-data- flywheel]] + concepts/production-data-diversity + patterns/distributed-fleet-as-data-pipeline + patterns/trigger-based-edge-capture + patterns/vlm-assisted-pre-labeling + patterns/resilient-edge-uploader.
-
2026-02-09 — From Print to Digital: Making Weekly Flyers Shoppable at Instacart Through Computer Vision and LLMs → sources/2026-02-09-instacart-from-print-to-digital-making-weekly-flyers-shoppable — Instacart Engineering post (author: Prithvi Srinivasan per inline Medium byline) on the internal flyer- digitization pipeline. Two-phase architecture: Phase 1 segments each flyer image into per-product bounding boxes; Phase 2 matches each box to a concrete Instacart catalog SKU via OCR + LLM + internal search. Before/after: 3–4 hours manual work per flyer → <30 minutes end-to-end after automation. Rejected approaches: FoodSAM (food-specific SAM variant) for insufficient product breadth; pure multimodal-LLM bounding- box prediction on complex flyers for imprecision; classical contour detection standalone for noise. Shipped Phase-1 architecture: complexity-tiered routing — simple flyers use iterative-grid multimodal-LLM probing (~90% accuracy); complex flyers use SAM
-
four post-processing stages (text-box removal, WBF box-merging, SAM + contour ensemble gated per-retailer on flyer density, heuristic + ML filters on aspect ratio + size). WBF vs NMS is explicitly motivated: NMS "may discard valuable information by eliminating lower-confidence boxes" — cited prior art: +3–10% mAP in medical-imaging ensembles. Phase-2 is captured-body truncated — named challenges (multi-item deals → N-SKU, generic produce with no branded text to OCR) are documented but the LLM stack + catalog- search integration are not elaborated in the captured body. Canonical wiki instances of patterns/hybrid-cv-plus-llm-pipeline + patterns/complexity-tiered-model-selection. Fourth Instacart source on the wiki; fourth Instacart visual-ML system alongside PIXEL (generation), PARSE (attribute extraction), and the Caper Jetpack-Compose migration.
-
2026-02-03 — Migrating to Jetpack Compose: How AI Accelerated Our Journey at Caper → sources/2026-02-03-instacart-migrating-to-jetpack-compose — Instacart Engineering post (author Matt Kranzler) on migrating the Android app powering Caper smart carts (AI + computer-vision in-store scan-and-pay carts, stability-critical hardware) from Fragments + XML layouts to Jetpack Compose via a deliberate four-phase plan accelerated by AI coding assistants. Phase 1 (manual, no AI) removed explicit Fragment wrappers using Google's
navigation-fragment-composefor implicit Fragment hosts; established the load-bearing outer-parameterless / inner-testable Composable split that makes the eventual Phase-4 Compose-Navigation migration cheap. Phase 2 migrated 30+ sub-navigation graphs and 130+ destinations from XML resource-ID navigation to type-safe Kotlin DSL via an iterative AI workflow (Learn-by-Doing → Git-Diff-as-Context → Correct-and-Refine → Update-the-Guide → Repeat); reported 5–7× faster, 300–350 engineering hours saved, "migrations previously too tedious to justify" became feasible. Phase 3 converts 100+ Fragment features to pure Compose via a 17-step AI skill with engineer verification checkpoints across four stages (Analysis+Baselining using Paparazzi screenshots / Compose Implementation / Verification+Integration with visual-parity diff / Cleanup); formalised from an earlier 325+ line markdown migration guide into a structured Agent Skill after 5–6 prior migrations made the workflow predictable — "Skills enable progressive disclosure of information, allowing the AI to access exactly what it needs at each step without overwhelming the context window." Phase 4 (Compose Navigation) runs in parallel with the tail of Phase 3 behind feature flags. Four named principles for AI-assisted refactoring: (1) the economics of technical debt have changed, (2) treat AI instructions as code — the guide "is effectively a program that the AI executes," triple-duty (AI executes, humans checklist, reviewers verify), (3) incrementalism mitigates AI risk, (4) invest in the workflow not just the tool — when Agent Skills emerged mid-project the workflow evolved. Engineer-role shift named: "from execution to definition and validation" — architecture + pattern definition + oversight, not typing the thousands of mechanical edits. Scaling claim (directional, not quantified): "other engineering teams across Instacart are now using AI skills to tackle their own large-scale refactoring challenges" — explicit playbook posture vs. single- migration retrospective. Canonical wiki instances of patterns/phased-framework-migration + patterns/ai-migration-skill-workflow + patterns/visual-parity-screenshot-gate + concepts/ai-assisted-refactoring-economics + concepts/ai-instructions-as-code. First Instacart mobile-platform source on the wiki (after PIXEL / PARSE / Maple / Intent Engine on the ML-platform axis) and first wiki instance canonicalising AI-skill-driven mobile UI framework migration. Fifth Instacart source on the wiki. -
2025-11-13 — Building The Intent Engine: How Instacart is Revamping Query Understanding with LLMs → sources/2025-11-13-instacart-building-the-intent-engine — Instacart Engineering post replacing the legacy query-understanding stack (multiple bespoke ML models) with an LLM-backed Intent Engine, layered across three progressively-more-invasive adaptation techniques: prompting → context-engineering (RAG) → fine-tuning. Three QU sub-tasks rebuilt: (1) query category classification via retrieve-top-K-converted-categories → LLM re-rank with injected Instacart context → semantic-similarity guardrail filter — replaces legacy flat-multi-class FastText that emitted taxonomically-inconsistent pairs and lacked world knowledge; (2) query rewrites via three specialised prompts — Substitutes / Broader / Synonyms — each with chain-of-thought + few-shot exemplars + post-processing relevance guardrail — lifted coverage from legacy ~50% to >95% at 90%+ precision; (3) SRL (query tagging — product/brand/attribute) via a load-bearing hybrid cache + real-time fine-tuned model architecture. SRL deep-dive is the post's substance: offline RAG "teacher" pipeline (conversion data + catalog + brand-embedding similarity + frontier LLM + post-processing guardrail) is dual-purposed — its output populates the head-query cache AND becomes the supervised training set for a Llama-3-8B + LoRA student. Latency path for the student, out-of-box to production: ~700 ms on A100 → 300 ms target after adapter merging + H100 upgrade; FP8 quantization gave another 10% but was not shipped because of a slight recall regression; GPU autoscaling at off-peak manages cost. Only 2% of queries hit the real-time model; ~98% served from cache. Production quality: precision 96.4% vs 95.4% frontier baseline, recall 95.0% vs 96.2%, F1 95.7% vs 95.8% — F1-parity with precision-bias. A/B outcomes: 6% reduction in average scroll depth on tail queries, 50% reduction in user complaints on tail-query search quality, millions of cold-start queries served weekly. Strategic framing: "A generic LLM is a commodity; your business context is what makes your application defensible, because domain knowledge is the most valuable asset." Authors: Yuanzheng Zhu, Guanghua Shu, Raochuan Fan, Vinesh Gudla, Tejaswi Tenneti. Third Instacart source on the wiki — extends the PIXEL (content generation) + PARSE (structured extraction) platform-consolidation pattern-graph into the retrieval-relevance axis; same "stop every team from DIY'ing this" architectural stance, different data surface (search queries). Canonical wiki instances of patterns/head-cache-plus-tail-finetuned-model + patterns/offline-teacher-online-student-distillation.
-
2025-08-27 — Simplifying Large-Scale LLM Processing across Instacart with Maple → sources/2025-08-27-instacart-simplifying-large-scale-llm-processing-with-maple — Instacart Engineering post on Maple, the internal batch- LLM-processing service consolidating every team's batch workflows into one. CSV/Parquet in, CSV/Parquet out RPC hiding the provider's 50K-prompt / 200 MB / 24 h batch API. Technology stack: Python + PyArrow + orjson + Temporal for durable execution + S3-Parquet intermediate storage. Architectural layers: Maple on top of the internal AI Gateway on top of the external LLM provider; the AI Gateway integrates with Cost Tracker for per-team attribution. Production numbers from a ~580-batch / 40–50K-tasks-per-batch sample: mean 2.6 prompts/sec/batch; most batches complete in < 12 h (SLA 24 h); scale to 10M+ prompt jobs; ~50% cost reduction vs real-time; "hundreds of thousands of dollars per year to just thousands" on specific processes. Four-class failure taxonomy (expired
-
rate-limited = infinite retry; refused = max 2×; invalid-image = optional with image-check on retry #2 only — "checking each image in a large batch can add significant overhead"). Real-time fallback for providers without batch APIs, behind the same CSV interface — platform hides provider-capability heterogeneity. Three scale optimisations forced by growth to 10M+ prompts: (1) DB → S3 Parquet intermediate storage (25× compression), (2) stream-based processing, (3) orjson for JSON parsing. Adoption: Catalog, Fulfillment, Search, ML-training teams each with distinct workloads. Canonical patterns/llm-batch-processing-service + patterns/batch-then-real-time-fallback + patterns/infinite-retry-by-failure-class + patterns/csv-in-parquet-intermediate-output-merge. Same "stop every team from DIY'ing this" architectural stance as PIXEL + PARSE, at the batch-inference layer. Third Instacart source on the wiki.
-
2025-08-01 — Scaling Catalog Attribute Extraction with Multi-modal LLMs (PARSE) → sources/2025-08-01-instacart-scaling-catalog-attribute-extraction-with-multi-modal-llms — Instacart Engineering post announcing PARSE (Product Attribute Recognition System for E-commerce), the internal self-serve multi-modal LLM platform for structured attribute extraction across the catalog. Four components (declarative + versioned Platform UI → ML extraction endpoint with self-verification confidence score → quality screening with dev/prod HITL loops → catalog ingestion). Three reusable architectural ideas surfaced: (1) multi-modal reasoning closes the text-only blind spot —
sheet_countrecall +10% over text-only LLM, with two archetypal examples: 80-sheets-on- packaging (image-only signal) and "3 boxes of 124 tissues" (text-only but needs multiplication); (2) per-attribute prompt-tuning effort + LLM size are load-bearing — organic 1 day / 95% accuracy first prompt, low-sugar 3 days; cheap LLM gives -70% cost at equivalent quality on simple attributes but -60% accuracy on hard ones, motivating per- attribute model choice; (3) future cost reduction comes from prompt batching (multi-attribute or multi-product) + extraction cache keyed by a product-similarity function. Same architectural DNA as sibling PIXEL: self-serve UX, model-agnostic platform stance, LLM-as-judge in the evaluation loop — applied to structured extraction instead of image generation. Second Instacart source on the wiki. -
2025-07-17 — Introducing PIXEL: Instacart's Unified Image Generation Platform → sources/2025-07-17-instacart-introducing-pixel-instacarts-unified-image-generation-platform — Instacart Engineering announcement post on PIXEL, their internal unified image-generation platform. Five architectural components (unified parameter protocol across models + prompt-template + few-shot library + DreamBooth fine-tunes on Stable Diffusion for product-specific categories + automated VLM-based quality evaluation in a 4-step iterative refinement loop + RPC service on existing Instacart infra with S3 storage + [[systems/ snowflake|Snowflake]]-addressable image URLs). Reported headline numbers: 10× team time-to-image reduction; 20% → 85% human-judge approval rate after the VLM loop shipped; >25% reduction in Butcher Cuts navigation + add-to-cart time; 15% uplift in Lifestyle Imagery personalised- carousel cart conversion. Canonical design argument: "the best performing model varied project by project" — so PIXEL optimises for cheap cross-model A/B testing via the unified parameter protocol rather than standardising on one model. Key contributor: Shishir Kumar Prasad. First Instacart source on the wiki.