CONCEPT Cited by 1 source
Weighted Boxes Fusion (WBF)¶
Definition¶
Weighted Boxes Fusion (WBF) is a bounding-box-detection post-processing technique that merges overlapping bounding boxes — typically from multiple detectors or multiple passes of the same detector — into a single output box by computing a confidence-weighted average of the input boxes' coordinates and confidences. It is the principled alternative to Non-Maximum Suppression (NMS), which keeps the single highest-confidence box and discards all overlapping lower-confidence boxes outright.
Introduced in Weighted Boxes Fusion: Ensembling boxes from different object detection models (Solovyev et al., 2019).
Why it matters¶
When multiple detectors each produce a bounding box for the same underlying object, their coordinates almost never match exactly. The right way to combine them depends on what you believe about the disagreement:
- If disagreement = noise and the higher-confidence box is authoritative → use NMS. Keep the top box, drop the rest. Fast, simple, dominant baseline.
- If disagreement = complementary information (each detector got part of the truth) → use WBF. Average the coordinates weighted by confidence, producing a consensus box that's usually more accurate than any single input.
The argument for WBF is that NMS throws away information by eliminating low-confidence boxes that may still carry useful coordinate signal, while WBF incorporates every overlapping box's contribution into the merged output.
Mechanism¶
The simplified recipe (the paper has the full algorithm):
- Group overlapping boxes into clusters by IoU > threshold.
- For each cluster, compute the merged box as:
merged_confidence = mean(confidences)(or sum, scaled)merged_box = sum(confidence_i × box_i) / sum(confidences)— i.e. each box's coordinates contribute proportionally to its confidence.- Output one merged box per cluster.
The key property: a low-confidence box still shifts the merged coordinates — it just shifts them less than a high-confidence box would.
Why use WBF over NMS¶
The canonical motivating case is ensembling detectors that each give correct-but-imprecise boxes. Cited published results: in medical imaging, combining outputs from multiple detectors using WBF gave +3–10% mAP over the best single model, a substantially larger gain than NMS-based ensembling typically delivers.
In Instacart's flyer-digitization pipeline, WBF is used to consolidate overlapping detections from SAM and its post-processed variants:
"Unlike traditional Non-Maximum Suppression (NMS), which may discard valuable information by eliminating lower-confidence boxes, WBF combines all overlapping boxes by computing a confidence-weighted average of their coordinates. This approach retains more information and often results in more precise bounding boxes."
Tradeoffs / gotchas¶
- Not a drop-in NMS replacement in all pipelines. WBF assumes overlapping boxes refer to the same object; if two genuinely distinct objects are near each other with overlapping boxes, WBF will incorrectly merge them into a single averaged box. NMS has the same failure mode, but lightly; WBF aggravates it because it actively blends coordinates.
- Requires well-calibrated confidence scores. WBF weights by confidence; if detector confidences are mis-calibrated (one detector is systematically over-confident), WBF will over-weight its outputs.
- Compute cost is higher than NMS. NMS is O(N log N) greedy suppression; WBF has to cluster, then compute weighted averages per cluster. Not usually load-bearing, but worth noting for high-throughput pipelines.
- Only helps when detectors disagree in coordinates. If all detectors produce identical boxes, WBF ≈ NMS. The win comes from coordinate complementarity, not just multiplicity.
Seen in¶
- sources/2026-02-09-instacart-from-print-to-digital-making-weekly-flyers-shoppable — canonical wiki instance. Instacart's flyer- digitization pipeline uses WBF to merge overlapping bounding boxes from SAM-style detection in Phase 1. "In our application, merging nearby boxes that likely represent the same product enhances detection accuracy and reduces redundancy." WBF is one of four Phase-1 post-processing stages sitting on top of SAM's raw output.
Related¶
- concepts/non-maximum-suppression — the classical alternative this concept improves on
- concepts/model-ensembling-for-detection — the broader ensembling pattern WBF enables
- systems/instacart-flyer-digitization-pipeline — canonical production use
- systems/segment-anything-model-sam — the detector whose outputs Instacart merges via WBF
- companies/instacart