PATTERN Cited by 1 source
Redundant hash-plus-value predicate¶
Pattern: when using a
generated hash column
for equality lookups, write the WHERE clause with both
the hash-equality predicate (which drives index use) and
the original-column equality predicate (which eliminates
any possibility of a hash collision affecting correctness).
The second predicate is logically redundant under any
collision-free assumption but is cheap to add, costs nothing
at runtime once the index narrows the row set, and turns
the hash into a
redundant
condition — an index-steering hint that does not change
results.
(Source: sources/2026-04-21-planetscale-generated-hash-columns-in-mysql.)
Canonical shape¶
SELECT * FROM visits
WHERE
url_md5 = UNHEX(MD5('https://planetscale.com')) -- enables index path
AND
url = 'https://planetscale.com'; -- eliminates collisions
Why each piece is present:
- The hash predicate is the only predicate the
optimiser can turn into an index seek, because
url_md5carries theKEY(url_md5)andurlcan't be directly indexed as aTEXT. - The value predicate is the correctness guard — in the astronomically rare case that two distinct URLs hash to the same MD5, the second predicate re-filters those candidate rows against the true column value.
Runtime cost¶
Near zero once the index is in play. The optimiser's execution plan is:
- Seek
url_md5index withUNHEX(MD5(?))— O(log n) pages + one or a handful of leaf hits. - Read the matching rows from the clustered index.
- Evaluate
url = ?on each — onememcmpper candidate row.
For any workload where the hash is meaningfully selective (which is what makes the pattern worth deploying in the first place), the candidate row count at step 3 is small, and the extra predicate is a rounding error.
When to use vs upgrade to SHA-256¶
The decision matrix is cost-of-collision vs cost-of-wider- hash:
| Cost of MD5 collision | Hash-plus-value predicate | SHA-256 upgrade |
|---|---|---|
| Irrelevant (analytics, best-effort dedup) | Unnecessary | Unnecessary |
| Bounded (returns an extra row, app handles) | Adequate | Overkill |
| Unbounded (silent data corruption) | Use | Use + hash-plus-value |
If the schema is already in production on MD5 and a hash
upgrade requires a migration, the redundant-predicate
pattern is the zero-downtime fix — add the predicate to
the hot queries and move on. If the schema is new,
SHA2(... , 256) in BINARY(32) removes the collision
concern structurally, and the redundant-predicate pattern
becomes optional belt-and-braces.
Contrast with MySQL index hints¶
Stronger steering is available via USE INDEX / FORCE
INDEX:
SELECT * FROM visits USE INDEX (url_md5)
WHERE url = 'https://planetscale.com'; -- only value predicate
This forces the index but requires MySQL-specific syntax
and — more subtly — still can't work here: url isn't
a predicate on the indexed column url_md5, so the
optimiser has no seek key even with the hint. The
redundant-predicate pattern is the right shape because it
gives the optimiser something to seek with; a hint only
steers which of multiple candidate indexes to use, it
doesn't invent a seek key.
Composite variant¶
For a composite hash column (patterns/composite-hash-uniqueness-constraint), the redundant predicate expands to the full tuple:
SELECT * FROM addresses
WHERE
address_hash = UNHEX(MD5(CONCAT_WS('|', ?, ?, ?, ?))) -- index path
AND primary_line = ?
AND secondary_line = ?
AND urbanization = ?
AND last_line = ?; -- collision guard
Same mechanism: hash-equality drives the index seek; per-column equalities re-verify the tuple bit-for-bit.
Seen in¶
- sources/2026-04-21-planetscale-generated-hash-columns-in-mysql — canonical wiki introduction. Aaron Francis explicitly frames the second predicate as "a redundant condition… does not change the results of the query but does provide MySQL with the option to choose a different execution path, usually an index-assisted path."
- systems/mysql — planner being steered.