SYSTEM Cited by 1 source
Segment Anything Model (SAM)¶
Definition¶
Segment Anything Model (SAM) is Meta AI's open-source, promptable image-segmentation foundation model released in April 2023. Given an input image and a prompt — a point, a bounding box, a mask, or (in follow-on variants) text — SAM outputs segmentation masks for the object(s) indicated by the prompt. When used without prompts, SAM can be asked to generate segmentation masks for every object in an image ("automatic mask generation" mode).
SAM was trained on the SA-1B dataset (1 billion masks over 11 million images), released alongside the model. Its architectural stance is to be a general-purpose segmentation primitive — a foundation model for segmentation, analogous to what large language models are for text — that downstream applications can build domain-specific post-processing on top of.
Project site: segment-anything.com
Why the sysdesign-wiki cares¶
SAM is load-bearing in production visual-ML pipelines at scale. The wiki's interest is not the ML architecture — the upstream paper covers ViT encoder + prompt encoder + mask decoder — but the system-design properties of treating SAM as a foundation component in a larger pipeline:
- SAM's output is a raw candidate mask set, not a shippable answer. Production pipelines layer domain-specific post-processing (text-box removal, merging overlapping detections via WBF, model-ensembling with classical contour detectors, heuristic + ML filters on aspect ratio and size) before the output is usable.
- Domain-specific SAM variants exist and often aren't enough alone. The flyer-digitization post explicitly evaluated FoodSAM (a food-specific SAM fine-tune / wrapper) and rejected it for being too narrow for retail-flyer diversity.
- SAM works best when composed with other detectors — the ensemble of SAM-style segmentation with classical contour detection covers different feature regimes than either does alone.
FoodSAM¶
FoodSAM is a food-specific segmentation-and-classification system built on top of SAM, referenced by the Instacart team as an off-the-shelf candidate for flyer digitization. Cited paper: FoodSAM: Any Food Segmentation (Lan et al., 2023). FoodSAM combines SAM's class-agnostic segmentation with a semantic food-category classifier to produce food-annotated masks.
The Instacart team rejected FoodSAM for retail-flyer digitization because flyer content extends well beyond food (branded packaged goods, household products, personal care) and FoodSAM's food-specialization "fell short of addressing the breadth and variety of products featured in retail flyers." — a useful cautionary note that foundation-model variants specialised for a subdomain are not necessarily useful for adjacent domains, and a generic SAM + domain-specific post-processing may beat a pre-specialised variant.
Seen in¶
- sources/2026-02-09-instacart-from-print-to-digital-making-weekly-flyers-shoppable — Instacart's flyer-digitization pipeline uses SAM as the Phase-1 base detector for complex flyers. SAM's raw output is passed through four post-processing stages — WBF-based merging, text-box removal, contour-detection ensembling gated per retailer, and heuristic + ML filtering on aspect ratio + size — before being usable. The domain-specific FoodSAM variant was explicitly evaluated and rejected as insufficient for retail-flyer diversity.
Related¶
- systems/instacart-flyer-digitization-pipeline — canonical production use of SAM on the wiki
- concepts/weighted-boxes-fusion — Instacart's chosen post-SAM box-merging technique
- concepts/non-maximum-suppression — the classical alternative WBF replaces
- concepts/model-ensembling-for-detection — SAM-plus-contour ensemble pattern
- companies/meta — SAM's author
- companies/instacart