CONCEPT Cited by 1 source
BLOB/TEXT index prefix requirement¶
MySQL requires that any B-tree index on a BLOB or TEXT
column declare a prefix length — the number of leading
bytes the index covers — because the full column value is
variable-length and may be too large to fit in an index
node. The server refuses to create a full BLOB/TEXT
index and instead forces the schema to choose a prefix:
This is a purely structural constraint of the B+tree: index entries must fit in a fixed node-size budget (concepts/b-plus-tree|16-KB InnoDB pages by default), and an unbounded variable-length value has no fixed budget.
(Source: sources/2026-04-21-planetscale-generated-hash-columns-in-mysql and the MySQL column-indexes reference.)
Consequence: strict equality on long values needs a different mechanism¶
A prefix index supports ORDER BY, LIKE 'prefix%', and
range scans on the prefix, but it cannot support strict
equality on the full value — two distinct values that share
the same prefix land on the same index entry, so a
WHERE url = '...' query has to re-check the full column
for every prefix match. For long URLs, document hashes,
or CASS-normalised addresses stored in TEXT, the prefix
that would need to be indexed to preserve distinctness could
easily exceed the whole value.
MySQL's own manual recommends the workaround explicitly:
Rather than testing for equality against a very long text string, you can store a hash of the column value in a separate column, index that column, and test the hashed value in queries. (Use the
MD5()orCRC32()function to produce the hash value.) — MySQL 8 manual — Optimising BLOB
This is the canonical motivation for the
generated hash column
pattern — a BINARY(16) MD5 of the TEXT is both small
enough to fit in an index node and narrow enough to make a
B+tree shallow and a buffer-pool working set small.
Related structural limits¶
- InnoDB index-key length cap. Historically 767 bytes
per index key (or 3,072 bytes with
innodb_large_prefix+DYNAMIC/COMPRESSEDrow format); the modern MySQL 8 default is 3,072 bytes. utf8mb4character width. A 4-byte-per-character encoding turns aVARCHAR(N)into up to4Nbytes at the index altitude, compressing the maximum indexable prefix in character terms. See concepts/utf8mb4-vs-utf8.
Seen in¶
- sources/2026-04-21-planetscale-generated-hash-columns-in-mysql
— canonical wiki reference for the prefix-requirement
motivation of the generated-hash-column pattern. Aaron
Francis quotes the MySQL manual's BLOB-optimisation
recommendation to motivate
url_md5 BINARY(16) AS (UNHEX(MD5(url))). - systems/mysql — the substrate.
- systems/innodb — where the page-size budget and index-key limit live.