CONCEPT Cited by 1 source
Functional-index collation mismatch¶
Definition¶
Functional-index collation mismatch is a silent-failure
mode in MySQL 8.0.13+ where a
functional index is
declared on an expression whose result collation differs from
the collation the query predicate is evaluated in. The index
exists on disk, is healthy, and the expressions are
otherwise identical — but EXPLAIN reports type: ALL,
key: NULL, possible_keys: NULL, and the query
table-scans with no error or warning.
The canonical case is indexing an extracted value from a
JSON column: JSON_UNQUOTE returns its result in the
utf8mb4_bin collation, while a CAST(... AS CHAR(N)) in
the functional-index expression defaults to the session's
connection collation (utf8mb4_0900_ai_ci in MySQL 8 by
default). The two expressions look identical in source text
but parse to expression trees whose collation annotations
don't match — so
expression
rewrite can't substitute the indexed reference.
(Source: sources/2026-04-21-planetscale-indexing-json-in-mysql.)
Worked failure-fix trace¶
From Aaron Francis's pedagogical PlanetScale post, the error-then-silent-fail-then-finally-works progression:
Attempt 1 — Naive functional index on JSON extraction:
ALTER TABLE activity_log ADD INDEX email
((properties->>"$.request.email")) USING BTREE;
-- ERROR: Cannot create a functional index on an expression
-- that returns a BLOB or TEXT. Please consider using CAST.
Loud and helpful — JSON_UNQUOTE returns LONGTEXT, which
isn't directly indexable.
Attempt 2 — Add CAST to CHAR(255):
ALTER TABLE activity_log ADD INDEX email
((CAST(properties->>"$.request.email" AS CHAR(255))))
USING BTREE;
-- Statement succeeds. No error, no warning.
EXPLAIN on the original WHERE clause:
EXPLAIN SELECT * FROM activity_log
WHERE properties->>"$.request.email" = 'x@example.com';
-- type: ALL, key: NULL, possible_keys: NULL, rows: <all>
The index exists but isn't even a candidate. No error at
CREATE INDEX time, no warning at EXPLAIN time, no hint
in the documentation adjacent to CAST syntax. Silent
performance collapse.
Attempt 3 — Explicit COLLATE utf8mb4_bin on the cast:
ALTER TABLE activity_log ADD INDEX email ((
CAST(properties->>"$.request.email" AS CHAR(255))
COLLATE utf8mb4_bin
)) USING BTREE;
EXPLAIN now reports possible_keys: email, key: email,
key_len: 1023. Index used.
Why it happens¶
JSON_UNQUOTE is documented to return a string in the
utf8mb4_bin collation — binary
comparison, no case-folding, no Unicode normalisation.
CAST(expr AS CHAR(N)) returns a string in the session's
default collation, which under MySQL 8's utf8mb4_0900_ai_ci
default is case-insensitive, accent-insensitive Unicode
collation.
The expression tree for the indexed expression carries the
collation annotation utf8mb4_0900_ai_ci; the expression
tree for the predicate (which just uses ->> — i.e.
JSON_UNQUOTE(JSON_EXTRACT(...)) — and doesn't cast) carries
utf8mb4_bin. The two trees differ, so the optimizer's
expression-matching step fails and no
rewrite
happens. The index sits there unused.
The fix (explicit COLLATE utf8mb4_bin inside the CAST)
pins the indexed expression's collation to match what
JSON_UNQUOTE returns — so the two expression trees now
agree.
Diagnostic signature¶
EXPLAIN SELECT ... WHERE <indexed-expression> = ...reports:type: ALL,key: NULL,possible_keys: NULL.SHOW INDEX FROM <table>confirms the index exists.- The
CREATE INDEXstatement did not fail or warn. - The source-text expression in the query literally matches the source-text expression inside the functional index.
When all four hold, collation mismatch is the first
hypothesis to check. Secondary check: query
information_schema.columns for the hidden generated
column the functional index created and inspect its
collation_name — if it's utf8mb4_0900_ai_ci and the
source expression returns utf8mb4_bin, the mismatch is
confirmed.
The named-generated-column form sidesteps the issue¶
A named generated column
declared as VARCHAR(255) (not CHAR/CAST) inherits the
collation from the column's declared type + the table
default, and the subsequent ADD INDEX is on the named
column, not on an expression — so collation matching isn't
needed at all. This is a concrete argument for preferring
named-column + index over the seemingly-terser functional
index when indexing JSON extraction.
Generalisation beyond JSON¶
The failure mode generalises: any functional index whose
expression returns an explicit collation (e.g. binary
functions, CONVERT(... USING utf8mb4), or expressions that
depend on a column declared with an explicit COLLATE) while
the query predicate is cast to a different collation will
hit the same silent-miss pattern.
Canonical guidance: when declaring a functional index over an expression with a non-default collation, match the collation explicitly in the index expression. Don't rely on CAST defaulting to the session connection collation.
Seen in¶
- sources/2026-04-21-planetscale-indexing-json-in-mysql
— canonical wiki introduction. Aaron Francis walks through
the three-step failure-fix-silent-fail-final-fix progression
on the JSON-extraction case and notes that "Unless
otherwise specified, casting a value to a string sets the
collation to
utf8mb4_0900_ai_ci. The JSON extraction functions, on the other hand, return a string with autf8mb4_bincollation. Therein lies our problem!"