Skip to content

CONCEPT Cited by 1 source

Weighted Boxes Fusion (WBF)

Definition

Weighted Boxes Fusion (WBF) is a bounding-box-detection post-processing technique that merges overlapping bounding boxes — typically from multiple detectors or multiple passes of the same detector — into a single output box by computing a confidence-weighted average of the input boxes' coordinates and confidences. It is the principled alternative to Non-Maximum Suppression (NMS), which keeps the single highest-confidence box and discards all overlapping lower-confidence boxes outright.

Introduced in Weighted Boxes Fusion: Ensembling boxes from different object detection models (Solovyev et al., 2019).

Why it matters

When multiple detectors each produce a bounding box for the same underlying object, their coordinates almost never match exactly. The right way to combine them depends on what you believe about the disagreement:

  • If disagreement = noise and the higher-confidence box is authoritative → use NMS. Keep the top box, drop the rest. Fast, simple, dominant baseline.
  • If disagreement = complementary information (each detector got part of the truth) → use WBF. Average the coordinates weighted by confidence, producing a consensus box that's usually more accurate than any single input.

The argument for WBF is that NMS throws away information by eliminating low-confidence boxes that may still carry useful coordinate signal, while WBF incorporates every overlapping box's contribution into the merged output.

Mechanism

The simplified recipe (the paper has the full algorithm):

  1. Group overlapping boxes into clusters by IoU > threshold.
  2. For each cluster, compute the merged box as:
  3. merged_confidence = mean(confidences) (or sum, scaled)
  4. merged_box = sum(confidence_i × box_i) / sum(confidences) — i.e. each box's coordinates contribute proportionally to its confidence.
  5. Output one merged box per cluster.

The key property: a low-confidence box still shifts the merged coordinates — it just shifts them less than a high-confidence box would.

Why use WBF over NMS

The canonical motivating case is ensembling detectors that each give correct-but-imprecise boxes. Cited published results: in medical imaging, combining outputs from multiple detectors using WBF gave +3–10% mAP over the best single model, a substantially larger gain than NMS-based ensembling typically delivers.

In Instacart's flyer-digitization pipeline, WBF is used to consolidate overlapping detections from SAM and its post-processed variants:

"Unlike traditional Non-Maximum Suppression (NMS), which may discard valuable information by eliminating lower-confidence boxes, WBF combines all overlapping boxes by computing a confidence-weighted average of their coordinates. This approach retains more information and often results in more precise bounding boxes."

Tradeoffs / gotchas

  • Not a drop-in NMS replacement in all pipelines. WBF assumes overlapping boxes refer to the same object; if two genuinely distinct objects are near each other with overlapping boxes, WBF will incorrectly merge them into a single averaged box. NMS has the same failure mode, but lightly; WBF aggravates it because it actively blends coordinates.
  • Requires well-calibrated confidence scores. WBF weights by confidence; if detector confidences are mis-calibrated (one detector is systematically over-confident), WBF will over-weight its outputs.
  • Compute cost is higher than NMS. NMS is O(N log N) greedy suppression; WBF has to cluster, then compute weighted averages per cluster. Not usually load-bearing, but worth noting for high-throughput pipelines.
  • Only helps when detectors disagree in coordinates. If all detectors produce identical boxes, WBF ≈ NMS. The win comes from coordinate complementarity, not just multiplicity.

Seen in

Last updated · 319 distilled / 1,201 read