Skip to content

PATTERN Cited by 1 source

Safe midpoint computation

Pattern

When computing the midpoint of two indices low and high (both non-negative) in a divide-and-conquer algorithm, do not write (low + high) / 2 — the sum can overflow the fixed-width integer type for sufficiently large arrays. Use either of these shapes instead:

int mid = low + ((high - low) / 2);     // Fix A: never forms the sum
int mid = (low + high) >>> 1;           // Fix B (Java): logical/unsigned right shift
mid = ((unsigned int)low + (unsigned int)high) >> 1;   // Fix B (C/C++, portable)

Both compute the mathematical average of low and high (truncated down toward zero) for all 0 ≤ low ≤ high ≤ INT_MAX. The naive (low + high) / 2 fails once low + high > INT_MAX, which occurs for arrays of roughly 2^30 (≈1 billion) elements or more.

The hazard

From Bloch, 2006:

int mid = (low + high) / 2;   // BROKEN

On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (2^31 − 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

The bug lay dormant in Programming Pearls (Bentley, 1986 / 2000) for two decades and in java.util.Arrays.binarySearch — which Bloch himself authored — for about nine years, because arrays that large didn't exist when the code was written. See concepts/integer-overflow for the underlying substrate.

Why Fix A works

mid = low + ((high - low) / 2):

  • low ≤ high, so high - low ≥ 0 — no underflow.
  • high - low ≤ INT_MAX (because high ≤ INT_MAX and low ≥ 0) — no overflow.
  • (high - low) / 2 ≤ (INT_MAX / 2) — no overflow.
  • low + (something ≤ INT_MAX/2) ≤ INT_MAX — no overflow.

Every intermediate value stays in range. Language-agnostic and portable.

Why Fix B works

mid = (low + high) >>> 1 relies on the interaction between two's-complement wraparound and logical right shift:

  • low + high can overflow signed INT_MAX; in two's-complement the high bit is set and interpretation as signed gives a negative number.
  • Interpreted as unsigned, the same bit pattern is the true arithmetic sum (mod 2^32), which for non-negative inputs summing in [0, 2 · INT_MAX] is exactly the mathematical sum.
  • Logical right shift by 1 halves the unsigned view → correct midpoint.

Likely one machine instruction — Bloch notes it's "probably faster, and arguably as clear" as Fix A.

The C99 undefined-behaviour footnote

Bloch's 2008 update (from Antoine Trux, Nokia Research) warns:

The older C Standard, C89/90, and the C++ Standard are both identical to C99 in this respect. … if you add two signed quantities and get an overflow, the result is undefined.

So the C/C++ shape must cast to unsigned before adding:

mid = ((unsigned int)low + (unsigned int)high) >> 1;

A signed-addition-then-logical-shift shape is "works on most compilers" but not guaranteed. See concepts/memory-safety for why "undefined behaviour" is not merely a theoretical concern — C compilers are allowed to optimize based on the assumption that signed overflow does not occur, which can delete surrounding code.

Generalisation: every divide-and-conquer over indices

The same shape bug afflicts every algorithm that averages two indices via (low + high) / 2:

  • Binary search — the canonical case.
  • Mergesort — recursive midpoint split.
  • Quicksort partitioning — median-of-three and midpoint-pivot variants.
  • Heap operations — parent/child index arithmetic in some implementations.
  • Segment-tree / Fenwick-tree range queries.
  • Radix-tree, B-tree, and other index-structure internal nodes with key-range midpoints.

Bloch: "The binary-search bug applies equally to mergesort, and to other divide-and-conquer algorithms. If you have any code that implements one of these algorithms, fix it now before it blows up."

Modern-language variants

  • Rust — the typed response is usize::midpoint(a, b) (stable since Rust 1.85) or (a + b) / 2 under usize (unsigned — the same modular recovery as Java's >>>). checked_add / wrapping_add / saturating_add make the invariant a compile-time choice.
  • Python — integers are arbitrary-precision; the naive formula cannot overflow. The pattern is vacuous here.
  • Goint is typically 64-bit on modern targets, making overflow astronomically improbable but not impossible; idiomatic code often still uses low + (high-low)/2 out of habit.
  • JavaScript — historically 32-bit-signed bitwise operations on Number; the >>> operator exists specifically for this class of problem. Post-BigInt the hazard is sharper on Number, safer on BigInt.

Pattern discipline

Use patterns/invariant-driven-programming: the invariant you want is "mid is the mathematical midpoint of low and high, truncated down", and the substrate that threatens it is "fixed-width signed integer arithmetic". Write the invariant; check it against the substrate; pick the idiom (Fix A or Fix B) that makes the invariant hold unconditionally.

Seen in

Last updated · 200 distilled / 1,178 read