Skip to content

SYSTEM Cited by 1 source

Segment Anything Model (SAM)

Definition

Segment Anything Model (SAM) is Meta AI's open-source, promptable image-segmentation foundation model released in April 2023. Given an input image and a prompt — a point, a bounding box, a mask, or (in follow-on variants) text — SAM outputs segmentation masks for the object(s) indicated by the prompt. When used without prompts, SAM can be asked to generate segmentation masks for every object in an image ("automatic mask generation" mode).

SAM was trained on the SA-1B dataset (1 billion masks over 11 million images), released alongside the model. Its architectural stance is to be a general-purpose segmentation primitive — a foundation model for segmentation, analogous to what large language models are for text — that downstream applications can build domain-specific post-processing on top of.

Project site: segment-anything.com

Why the sysdesign-wiki cares

SAM is load-bearing in production visual-ML pipelines at scale. The wiki's interest is not the ML architecture — the upstream paper covers ViT encoder + prompt encoder + mask decoder — but the system-design properties of treating SAM as a foundation component in a larger pipeline:

  • SAM's output is a raw candidate mask set, not a shippable answer. Production pipelines layer domain-specific post-processing (text-box removal, merging overlapping detections via WBF, model-ensembling with classical contour detectors, heuristic + ML filters on aspect ratio and size) before the output is usable.
  • Domain-specific SAM variants exist and often aren't enough alone. The flyer-digitization post explicitly evaluated FoodSAM (a food-specific SAM fine-tune / wrapper) and rejected it for being too narrow for retail-flyer diversity.
  • SAM works best when composed with other detectors — the ensemble of SAM-style segmentation with classical contour detection covers different feature regimes than either does alone.

FoodSAM

FoodSAM is a food-specific segmentation-and-classification system built on top of SAM, referenced by the Instacart team as an off-the-shelf candidate for flyer digitization. Cited paper: FoodSAM: Any Food Segmentation (Lan et al., 2023). FoodSAM combines SAM's class-agnostic segmentation with a semantic food-category classifier to produce food-annotated masks.

The Instacart team rejected FoodSAM for retail-flyer digitization because flyer content extends well beyond food (branded packaged goods, household products, personal care) and FoodSAM's food-specialization "fell short of addressing the breadth and variety of products featured in retail flyers." — a useful cautionary note that foundation-model variants specialised for a subdomain are not necessarily useful for adjacent domains, and a generic SAM + domain-specific post-processing may beat a pre-specialised variant.

Seen in

Last updated · 319 distilled / 1,201 read