CONCEPT Cited by 1 source

Weighted Boxes Fusion (WBF)¶

Definition¶

Weighted Boxes Fusion (WBF) is a bounding-box-detection post-processing technique that merges overlapping bounding boxes — typically from multiple detectors or multiple passes of the same detector — into a single output box by computing a confidence-weighted average of the input boxes' coordinates and confidences. It is the principled alternative to Non-Maximum Suppression (NMS), which keeps the single highest-confidence box and discards all overlapping lower-confidence boxes outright.

Introduced in Weighted Boxes Fusion: Ensembling boxes from different object detection models (Solovyev et al., 2019).

Why it matters¶

When multiple detectors each produce a bounding box for the same underlying object, their coordinates almost never match exactly. The right way to combine them depends on what you believe about the disagreement:

If disagreement = noise and the higher-confidence box is authoritative → use NMS. Keep the top box, drop the rest. Fast, simple, dominant baseline.
If disagreement = complementary information (each detector got part of the truth) → use WBF. Average the coordinates weighted by confidence, producing a consensus box that's usually more accurate than any single input.

The argument for WBF is that NMS throws away information by eliminating low-confidence boxes that may still carry useful coordinate signal, while WBF incorporates every overlapping box's contribution into the merged output.

Mechanism¶

The simplified recipe (the paper has the full algorithm):

Group overlapping boxes into clusters by IoU > threshold.
For each cluster, compute the merged box as:
merged_confidence = mean(confidences) (or sum, scaled)
merged_box = sum(confidence_i × box_i) / sum(confidences) — i.e. each box's coordinates contribute proportionally to its confidence.
Output one merged box per cluster.

The key property: a low-confidence box still shifts the merged coordinates — it just shifts them less than a high-confidence box would.

Why use WBF over NMS¶

The canonical motivating case is ensembling detectors that each give correct-but-imprecise boxes. Cited published results: in medical imaging, combining outputs from multiple detectors using WBF gave +3–10% mAP over the best single model, a substantially larger gain than NMS-based ensembling typically delivers.

In Instacart's flyer-digitization pipeline, WBF is used to consolidate overlapping detections from SAM and its post-processed variants:

"Unlike traditional Non-Maximum Suppression (NMS), which may discard valuable information by eliminating lower-confidence boxes, WBF combines all overlapping boxes by computing a confidence-weighted average of their coordinates. This approach retains more information and often results in more precise bounding boxes."

Tradeoffs / gotchas¶

Not a drop-in NMS replacement in all pipelines. WBF assumes overlapping boxes refer to the same object; if two genuinely distinct objects are near each other with overlapping boxes, WBF will incorrectly merge them into a single averaged box. NMS has the same failure mode, but lightly; WBF aggravates it because it actively blends coordinates.
Requires well-calibrated confidence scores. WBF weights by confidence; if detector confidences are mis-calibrated (one detector is systematically over-confident), WBF will over-weight its outputs.
Compute cost is higher than NMS. NMS is O(N log N) greedy suppression; WBF has to cluster, then compute weighted averages per cluster. Not usually load-bearing, but worth noting for high-throughput pipelines.
Only helps when detectors disagree in coordinates. If all detectors produce identical boxes, WBF ≈ NMS. The win comes from coordinate complementarity, not just multiplicity.

Seen in¶

sources/2026-02-09-instacart-from-print-to-digital-making-weekly-flyers-shoppable — canonical wiki instance. Instacart's flyer- digitization pipeline uses WBF to merge overlapping bounding boxes from SAM-style detection in Phase 1. "In our application, merging nearby boxes that likely represent the same product enhances detection accuracy and reduces redundancy." WBF is one of four Phase-1 post-processing stages sitting on top of SAM's raw output.

concepts/non-maximum-suppression — the classical alternative this concept improves on
concepts/model-ensembling-for-detection — the broader ensembling pattern WBF enables
systems/instacart-flyer-digitization-pipeline — canonical production use
systems/segment-anything-model-sam — the detector whose outputs Instacart merges via WBF
companies/instacart