Skip to content

PATTERN Cited by 1 source

Composite hash uniqueness constraint

Pattern: enforce uniqueness or enable fast equality lookup across multiple columns at once by (a) building a MySQL generated column that hashes a delimiter-separated concatenation of the columns, and (b) attaching a UNIQUE INDEX (or a plain index, if you only want the lookup speed) to the hash column.

One fixed-width index replaces a wide composite index, and UNIQUE on that hash is enforced by InnoDB at commit time as the single authoritative serialisation point for multi-column uniqueness.

(Source: sources/2026-04-21-planetscale-generated-hash-columns-in-mysql.)

Shape

ALTER TABLE addresses ADD COLUMN address_hash BINARY(16)
  GENERATED ALWAYS AS (
    UNHEX(MD5(
      CONCAT_WS('|', primary_line, secondary_line, urbanization, last_line)
    ))
  );
ALTER TABLE addresses ADD UNIQUE INDEX (address_hash);

Four design points compress into this shape:

1. CONCAT_WS, not CONCAT

CONCAT_WS ("concat with separator") is load-bearing over plain CONCAT for two reasons that map directly to correctness bugs in the hash:

  • Ambiguous tuples. CONCAT('a','b','c') and CONCAT('ab','c') both yield 'abc' — two distinct tuples collapse to the same hash. CONCAT_WS('|', 'a','b','c') = 'a|b|c' vs CONCAT_WS('|', 'ab','c') = 'ab|c' — distinct tuples stay distinct.
  • NULL propagation. CONCAT('ab', NULL, 'c') returns NULL — any NULL argument nukes the whole result, so any row with a missing field has a NULL hash, collapsing all such rows to one bucket. CONCAT_WS skips NULLs and emits a separator between non-null arguments, preserving shape information.

2. Delimiter that can't appear in data

The separator character (| in the example) must not appear in any source column value, otherwise tuple boundaries ambiguate again. For normalised data (CASS addresses, URLs) a non-ASCII separator or an ASCII control character (e.g. CHAR(0x1F) — the ASCII "unit separator" byte) guarantees uniqueness of delimitation.

3. BINARY(N) sized to the hash

BINARY(16) for MD5, BINARY(32) for SHA-256. Stored hex-character-wide it would double — and the character-set and collation apparatus doesn't help byte-equality anyway. See concepts/generated-hash-column.

4. UNIQUE INDEX is the serialisation point

A plain secondary index on the hash supports fast equality lookups but allows duplicates; a UNIQUE INDEX turns every insert into a cross-row check at commit time. This is the structural mechanism that enforces multi-column uniqueness on the database side — the application can issue parallel INSERTs and INSERT ... ON DUPLICATE KEY UPDATE statements confident that only one of them can win in the collision case. It parallels the discipline canonicalised in patterns/database-as-final-arbiter-of-uniqueness — the database's unique index is the only primitive that serialises at commit time.

Composite-index alternative and its shape

Composite indexes in MySQL can hold up to 16 columns and have the orthogonal advantage of prefix usability — a composite index on (primary_line, secondary_line, urbanization, last_line) serves queries on just primary_line or (primary_line, secondary_line), whereas a composite hash serves only full-tuple equality.

Axis Composite index Composite hash
Full-tuple equality ✓ (via hash)
Prefix query on leading columns
Range query on any column
Index width Sum of column widths × tree depth Fixed (16 or 32 bytes) × tree depth
Uniqueness enforcement ✓ via UNIQUE on composite ✓ via UNIQUE on hash
Null-handling Per-column CONCAT_WS absorbs cleanly

The composite index is the right choice when the schema needs prefix / range queries as well as equality; the composite hash is the right choice when the access pattern is tuple-equality only (dedup, uniqueness, exact-match lookup) and the tuple is wide.

Collision handling

Same three escalation paths as single-column hashing: (1) tolerate MD5 collisions (astronomically unlikely at schema scale), (2) upgrade to SHA-256 BINARY(32), or (3) pair the hash-equality predicate with a redundant full-tuple predicate via WHERE address_hash = ? AND primary_line = ? AND ....

Seen in

Last updated · 347 distilled / 1,201 read