PATTERN Cited by 1 source
Composite hash uniqueness constraint¶
Pattern: enforce uniqueness or enable fast equality
lookup across multiple columns at once by (a) building
a MySQL generated column
that hashes a delimiter-separated concatenation of the
columns, and (b) attaching a UNIQUE INDEX (or a plain
index, if you only want the lookup speed) to the hash
column.
One fixed-width index replaces a wide composite index, and
UNIQUE on that hash is enforced by InnoDB at commit time
as the single authoritative serialisation point for
multi-column uniqueness.
(Source: sources/2026-04-21-planetscale-generated-hash-columns-in-mysql.)
Shape¶
ALTER TABLE addresses ADD COLUMN address_hash BINARY(16)
GENERATED ALWAYS AS (
UNHEX(MD5(
CONCAT_WS('|', primary_line, secondary_line, urbanization, last_line)
))
);
ALTER TABLE addresses ADD UNIQUE INDEX (address_hash);
Four design points compress into this shape:
1. CONCAT_WS, not CONCAT¶
CONCAT_WS ("concat with separator") is load-bearing over
plain CONCAT for two reasons that map directly to
correctness bugs in the hash:
- Ambiguous tuples.
CONCAT('a','b','c')andCONCAT('ab','c')both yield'abc'— two distinct tuples collapse to the same hash.CONCAT_WS('|', 'a','b','c')='a|b|c'vsCONCAT_WS('|', 'ab','c')='ab|c'— distinct tuples stay distinct. NULLpropagation.CONCAT('ab', NULL, 'c')returnsNULL— anyNULLargument nukes the whole result, so any row with a missing field has aNULLhash, collapsing all such rows to one bucket.CONCAT_WSskipsNULLs and emits a separator between non-null arguments, preserving shape information.
2. Delimiter that can't appear in data¶
The separator character (| in the example) must not
appear in any source column value, otherwise tuple
boundaries ambiguate again. For normalised data (CASS
addresses, URLs) a non-ASCII separator or an ASCII control
character (e.g. CHAR(0x1F) — the ASCII "unit separator"
byte) guarantees uniqueness of delimitation.
3. BINARY(N) sized to the hash¶
BINARY(16) for MD5, BINARY(32) for SHA-256. Stored
hex-character-wide it would double — and the character-set
and collation apparatus doesn't help byte-equality anyway.
See concepts/generated-hash-column.
4. UNIQUE INDEX is the serialisation point¶
A plain secondary index on the hash supports fast equality
lookups but allows duplicates; a UNIQUE INDEX turns every
insert into a cross-row check at commit time. This is the
structural mechanism that enforces multi-column uniqueness
on the database side — the application can issue parallel
INSERTs and INSERT ... ON DUPLICATE KEY UPDATE
statements confident that only one of them can win in the
collision case. It parallels the discipline canonicalised
in
patterns/database-as-final-arbiter-of-uniqueness — the
database's unique index is the only primitive that
serialises at commit time.
Composite-index alternative and its shape¶
Composite indexes in MySQL can
hold up to 16 columns and have the orthogonal advantage of
prefix usability — a composite index on
(primary_line, secondary_line, urbanization, last_line)
serves queries on just primary_line or
(primary_line, secondary_line), whereas a composite hash
serves only full-tuple equality.
| Axis | Composite index | Composite hash |
|---|---|---|
| Full-tuple equality | ✓ | ✓ (via hash) |
| Prefix query on leading columns | ✓ | ✗ |
| Range query on any column | ✓ | ✗ |
| Index width | Sum of column widths × tree depth | Fixed (16 or 32 bytes) × tree depth |
| Uniqueness enforcement | ✓ via UNIQUE on composite |
✓ via UNIQUE on hash |
| Null-handling | Per-column | CONCAT_WS absorbs cleanly |
The composite index is the right choice when the schema needs prefix / range queries as well as equality; the composite hash is the right choice when the access pattern is tuple-equality only (dedup, uniqueness, exact-match lookup) and the tuple is wide.
Collision handling¶
Same three escalation paths as single-column hashing:
(1) tolerate MD5 collisions (astronomically unlikely at
schema scale), (2) upgrade to SHA-256 BINARY(32), or
(3) pair the hash-equality predicate with a
redundant
full-tuple predicate via
WHERE address_hash = ? AND primary_line = ? AND ....
Seen in¶
- sources/2026-04-21-planetscale-generated-hash-columns-in-mysql
— canonical wiki introduction via Aaron Francis's CASS-
standardised
addressestable, including the four-fieldCONCAT_WS('|', ...)concatenation, theBINARY(16)storage choice, and theADD UNIQUE INDEXstep. - systems/mysql — substrate.
- systems/innodb — storage engine enforcing the unique-index cross-row check.