CONCEPT Cited by 1 source
Model Ensembling for Detection¶
Definition¶
Model ensembling for detection is the practice of running multiple detection models on the same input and combining their outputs into a single consensus detection set, rather than picking one model. The combining is typically done at the bounding-box level via NMS or WBF, with WBF generally better when detectors carry complementary coordinate information.
This is specifically about object-detection / segmentation ensembling; the general concept of ML ensembling (bagging, boosting, stacking) is broader.
Why it matters¶
Different detection models excel on different feature regimes. A foundation segmentation model like SAM is strong on general-object mask extraction but produces noisy output on densely-packed scenes. Classical contour detection is strong on clear edge-bounded regions but drowns in clutter. An ensemble that combines both recovers the best of each without the user having to commit globally to either model.
The argument for ensembling is the same as for WBF: multiple detectors encode complementary priors, and combining them covers a broader range of object representations than any one model does alone. The wiki's canonical framing from Instacart:
"To leverage the strengths of different detection approaches, we combined outputs from segmentation models and contour detection algorithms. This ensemble strategy allows us to capture a broader range of product representations, as different models may excel in detecting various features. By integrating their outputs, we achieve a more comprehensive and robust detection system." (Source: sources/2026-02-09-instacart-from-print-to-digital-making-weekly-flyers-shoppable)
Mechanism¶
A typical ensemble-for-detection pipeline:
- Run each detector model independently on the input image.
Each emits
(box, confidence)pairs. - Union all boxes across all detectors into a single candidate set.
- Cluster overlapping boxes (IoU > threshold).
- Fuse each cluster:
- NMS → keep the highest-confidence box, drop the rest.
- WBF → weighted-average the coordinates + confidences.
- Output the fused box set.
Optionally, apply further filtering — heuristic (aspect ratio, size) or ML-based (a classifier trained to reject noise boxes) — to clean up false positives.
Dynamic ensembling¶
In the Instacart pipeline, the ensemble is not static — the contour-detection branch is gated per retailer based on flyer density:
"The decision whether or not to use contour detection models was based on how densely the flyer images were packed. This varied from retailer to retailer."
This is a specific instance of a broader pattern: dynamically turn ensemble branches on or off based on input characteristics, rather than always paying the full ensemble cost. Related to patterns/complexity-tiered-model-selection — same intuition, different axis: routing vs. gating.
Tradeoffs / gotchas¶
- Cost scales with the number of detectors. An N-model ensemble is ~N times the compute per input. Dynamic gating (like Instacart's per-retailer contour toggle) is the mitigation.
- Fusion choice matters more than model count. Adding a fourth detector with NMS fusion often under-performs a two-detector WBF ensemble. See concepts/weighted-boxes-fusion.
- Coordinated failure modes don't get fixed by ensembling. If every detector has the same blind spot (e.g. all trained on the same dataset that never showed curved packaging), the ensemble will miss those boxes uniformly. Diversity of priors is the load-bearing property, not just diversity of weights.
- Confidence calibration across detectors. When two detectors' confidence scales differ, naive WBF-weighting gives a systematically biased average. In practice: either train a calibration head or switch to ranked-confidence fusion.
Seen in¶
- sources/2026-02-09-instacart-from-print-to-digital-making-weekly-flyers-shoppable — canonical wiki instance. Instacart's flyer- digitization pipeline ensembles SAM-style segmentation outputs with classical contour-detection outputs for Phase 1 bounding-box extraction, with the contour branch gated per retailer on flyer density.
Related¶
- concepts/weighted-boxes-fusion — the preferred fusion technique for ensembled detectors
- concepts/non-maximum-suppression — the classical fusion technique for single-detector de-duplication
- patterns/complexity-tiered-model-selection — the sibling pattern of routing inputs by complexity rather than gating branches
- systems/instacart-flyer-digitization-pipeline — canonical production use
- systems/segment-anything-model-sam — one of the detectors in Instacart's ensemble
- companies/instacart