Skip to content

PATTERN Cited by 1 source

Redundant hash-plus-value predicate

Pattern: when using a generated hash column for equality lookups, write the WHERE clause with both the hash-equality predicate (which drives index use) and the original-column equality predicate (which eliminates any possibility of a hash collision affecting correctness). The second predicate is logically redundant under any collision-free assumption but is cheap to add, costs nothing at runtime once the index narrows the row set, and turns the hash into a redundant condition — an index-steering hint that does not change results.

(Source: sources/2026-04-21-planetscale-generated-hash-columns-in-mysql.)

Canonical shape

SELECT * FROM visits
  WHERE
    url_md5 = UNHEX(MD5('https://planetscale.com'))   -- enables index path
    AND
    url     = 'https://planetscale.com';              -- eliminates collisions

Why each piece is present:

  • The hash predicate is the only predicate the optimiser can turn into an index seek, because url_md5 carries the KEY(url_md5) and url can't be directly indexed as a TEXT.
  • The value predicate is the correctness guard — in the astronomically rare case that two distinct URLs hash to the same MD5, the second predicate re-filters those candidate rows against the true column value.

Runtime cost

Near zero once the index is in play. The optimiser's execution plan is:

  1. Seek url_md5 index with UNHEX(MD5(?)) — O(log n) pages + one or a handful of leaf hits.
  2. Read the matching rows from the clustered index.
  3. Evaluate url = ? on each — one memcmp per candidate row.

For any workload where the hash is meaningfully selective (which is what makes the pattern worth deploying in the first place), the candidate row count at step 3 is small, and the extra predicate is a rounding error.

When to use vs upgrade to SHA-256

The decision matrix is cost-of-collision vs cost-of-wider- hash:

Cost of MD5 collision Hash-plus-value predicate SHA-256 upgrade
Irrelevant (analytics, best-effort dedup) Unnecessary Unnecessary
Bounded (returns an extra row, app handles) Adequate Overkill
Unbounded (silent data corruption) Use Use + hash-plus-value

If the schema is already in production on MD5 and a hash upgrade requires a migration, the redundant-predicate pattern is the zero-downtime fix — add the predicate to the hot queries and move on. If the schema is new, SHA2(... , 256) in BINARY(32) removes the collision concern structurally, and the redundant-predicate pattern becomes optional belt-and-braces.

Contrast with MySQL index hints

Stronger steering is available via USE INDEX / FORCE INDEX:

SELECT * FROM visits USE INDEX (url_md5)
  WHERE url = 'https://planetscale.com';  -- only value predicate

This forces the index but requires MySQL-specific syntax and — more subtly — still can't work here: url isn't a predicate on the indexed column url_md5, so the optimiser has no seek key even with the hint. The redundant-predicate pattern is the right shape because it gives the optimiser something to seek with; a hint only steers which of multiple candidate indexes to use, it doesn't invent a seek key.

Composite variant

For a composite hash column (patterns/composite-hash-uniqueness-constraint), the redundant predicate expands to the full tuple:

SELECT * FROM addresses
  WHERE
    address_hash = UNHEX(MD5(CONCAT_WS('|', ?, ?, ?, ?)))  -- index path
    AND primary_line = ?
    AND secondary_line = ?
    AND urbanization = ?
    AND last_line = ?;                                      -- collision guard

Same mechanism: hash-equality drives the index seek; per-column equalities re-verify the tuple bit-for-bit.

Seen in

Last updated · 347 distilled / 1,201 read