PATTERN Cited by 1 source
Generated hash column for equality lookup¶
Pattern: when a column is too large or too wide to index directly for strict-equality lookups, add a MySQL generated column that holds a compact hash of the value, index the hash column, and issue equality queries against the hash. The compact index is dramatically smaller than the original column, fits comfortably in a B+tree, and lives fully inside the storage engine so the application sees only normal SQL.
The canonical substrate is MySQL / InnoDB; the canonical disclosure is Aaron Francis's PlanetScale post.
(Source: sources/2026-04-21-planetscale-generated-hash-columns-in-mysql.)
When to use¶
- The searched column is
TEXT/BLOB/LONGTEXT/JSONand therefore can only be indexed via a prefix index — which doesn't support strict equality on the full value. - The searched column is a long
VARCHAR(500+ bytes) that the engine technically can index but a hash index is much narrower and cheaper to maintain. - The searched column stores normalised data (URLs, canonical file paths, normalised addresses) where equality is the dominant access pattern and range / sort / prefix queries are rare.
When not to use¶
- Range queries or prefix matches (
LIKE 'foo%',ORDER BY) are also needed — hashing destroys all ordering, so an actual B-tree index on (a prefix of) the real column is required. - Full-text search is needed — use a
FULLTEXTindex or a dedicated search system (systems/opensearch, systems/elasticsearch). - The column is already narrow (e.g. a 32-byte UUID) — the original column is already a cheap index key; hashing it adds bytes, not saves them.
Shape¶
CREATE TABLE visits (
url TEXT,
url_md5 BINARY(16) AS (UNHEX(MD5(url))), -- generated hash
-- [other columns...]
KEY(url_md5)
);
-- Application query:
SELECT * FROM visits WHERE url_md5 = UNHEX(MD5('https://planetscale.com'));
Three design decisions visible in the shape:
- Binary, not character.
BINARY(16)overCHAR(32)— MD5 output is 128 bits; stored as hex it doubles to 32 bytes and drags in a character set and collation that don't matter for byte-equality.UNHEX(MD5(...))round-trips through raw bytes. - Generated, not hand-maintained. The hash value is
derived from the row's own column via a deterministic
function; MySQL regenerates it on every write. No
dual-write consistency problem, no backfill race during
schema change, no forgotten
UPDATE. - Indexed, not just stored. The hash is useless
without
KEY(url_md5)— the whole point is to have something that fits in a B+tree.
Upgrading hash strength¶
| Algorithm | Output bytes | Column type | Collision profile |
|---|---|---|---|
CRC32 |
4 | INT UNSIGNED |
Collisions common; reliability bad |
MD5 |
16 | BINARY(16) |
Cryptographically broken but collision-free in practice for any reasonable schema |
SHA2(... , 256) |
32 | BINARY(32) |
Collision probability beneath hardware failure rate |
If the correctness cost of an MD5 collision is non-zero, either upgrade to SHA-256 or pair the hash equality with a redundant predicate on the original column — see patterns/redundant-hash-plus-value-predicate and the concepts/redundant-condition-query-hint framing.
Interaction with functional indexes¶
MySQL 8.0.13+ offers functional indexes — the equivalent of this pattern with the hash column hidden inside the index. Runtime behaviour is identical; the named-column form is preferred when the hash is referenced by name in application SQL.
Composite-column variant¶
For multi-column equality or uniqueness (enforce that an
address tuple is unique across four fields), the single-
column shape extends to the composite-hash pattern — see
patterns/composite-hash-uniqueness-constraint — which
adds CONCAT_WS + UNIQUE INDEX on the hash column.
Seen in¶
- sources/2026-04-21-planetscale-generated-hash-columns-in-mysql
— canonical wiki introduction via Aaron Francis's worked
visits+addressesexamples, including theBINARY(16)vsCHAR(32)storage argument and theUNHEXround-trip. - systems/mysql — substrate.
- systems/innodb — storage engine carrying the hash-column secondary index.
Related¶
- concepts/generated-column-mysql
- concepts/generated-hash-column
- concepts/blob-text-index-prefix-requirement
- concepts/redundant-condition-query-hint
- concepts/functional-index-mysql
- concepts/secondary-index
- patterns/composite-hash-uniqueness-constraint
- patterns/redundant-hash-plus-value-predicate
- systems/mysql
- systems/innodb