Skip to content

SYSTEM Cited by 1 source

OpenJDK java.util.Arrays.binarySearch

Definition

java.util.Arrays.binarySearch is the standard-library binary-search routine in the Java Development Kit (JDK / OpenJDK), authored by Joshua Bloch. It sits in java.util.Arrays with overloads for each primitive element type (int[], long[], double[], Object[] with a Comparator, etc.). Contract: for a sorted array a and key k, returns the index of k if present, or -(insertionPoint + 1) if absent.

Wiki relevance — why there's a page at all

For ~9 years (roughly 1997 to ~2006), the int-overload of this routine shipped with a classic concepts/integer-overflow bug in its midpoint computation:

int mid = (low + high) / 2;

For arrays of 2^30 elements or more, low + high can exceed INT_MAX = 2^31 − 1; the sum wraps to a negative value; division by 2 preserves the sign; a[mid] throws ArrayIndexOutOfBoundsException.

The bug was user-reported to Sun after it broke production code. Bloch's canonical retrospective (post) uses this exact JDK source as the running example — the wiki page exists specifically because the JDK shipped the bug and Bloch's post is the industry's best-known retelling.

The code, pre- and post-fix

Pre-fix (the bug):

public static int binarySearch(int[] a, int key) {
    int low = 0;
    int high = a.length - 1;
    while (low <= high) {
        int mid = (low + high) / 2;   // ← OVERFLOW FOR a.length ≥ 2^30
        int midVal = a[mid];
        if      (midVal < key) low  = mid + 1;
        else if (midVal > key) high = mid - 1;
        else                    return mid;
    }
    return -(low + 1);
}

Post-fix (Bloch's recommended shapes, either works):

int mid = low + ((high - low) / 2);   // Fix A — portable
int mid = (low + high) >>> 1;         // Fix B — Java's unsigned right shift

See patterns/safe-midpoint-computation for the pattern and concepts/unsigned-right-shift for the language primitive behind Fix B.

Why it took ~9 years to surface

Two preconditions had to coincide:

  1. A sorted int[] of at least 2^30 (≈1 billion) elements — essentially unheard-of in the late-'90s JDK deployment context; increasingly common at Google and comparable shops by the mid-2000s.
  2. binarySearch called with low and high eventually close enough to INT_MAX / 2 for their sum to overflow.

Until both were true in the same JVM, the bug was invisible. See concepts/program-correctness for the broader correctness-under-tested-inputs ≠ correctness-under-all-inputs framing Bloch uses this case study for.

The same arithmetic shape afflicted — and had to be separately fixed in — every divide-and-conquer algorithm in the JDK that averaged two indices:

  • java.util.Arrays.sort (mergesort / TimSort internals)
  • java.util.Collections.binarySearch
  • Primitive sort routines for long[], double[], etc.
  • Any user code copying the (low + high) / 2 idiom from Programming Pearls or the JDK prior to the fix.

Bloch's post is a call to audit the whole family — not just to patch binarySearch.

Source provenance note

The exact JDK commit / bug-ID / version where the fix landed is not cited in the post; the wiki page does not fabricate it. The fix is in the Programming Pearls 2nd-edition (2000) errata and in JDK versions post-report (~2006-era, JDK 6 timeframe). What the wiki captures from primary evidence is Bloch's narrative: he wrote the broken code; a user hit it; it was fixed.

Seen in

Last updated · 200 distilled / 1,178 read