Skip to content

CONCEPT Cited by 1 source

Functional-index collation mismatch

Definition

Functional-index collation mismatch is a silent-failure mode in MySQL 8.0.13+ where a functional index is declared on an expression whose result collation differs from the collation the query predicate is evaluated in. The index exists on disk, is healthy, and the expressions are otherwise identical — but EXPLAIN reports type: ALL, key: NULL, possible_keys: NULL, and the query table-scans with no error or warning.

The canonical case is indexing an extracted value from a JSON column: JSON_UNQUOTE returns its result in the utf8mb4_bin collation, while a CAST(... AS CHAR(N)) in the functional-index expression defaults to the session's connection collation (utf8mb4_0900_ai_ci in MySQL 8 by default). The two expressions look identical in source text but parse to expression trees whose collation annotations don't match — so expression rewrite can't substitute the indexed reference.

(Source: sources/2026-04-21-planetscale-indexing-json-in-mysql.)

Worked failure-fix trace

From Aaron Francis's pedagogical PlanetScale post, the error-then-silent-fail-then-finally-works progression:

Attempt 1 — Naive functional index on JSON extraction:

ALTER TABLE activity_log ADD INDEX email
  ((properties->>"$.request.email")) USING BTREE;
-- ERROR: Cannot create a functional index on an expression
-- that returns a BLOB or TEXT. Please consider using CAST.

Loud and helpful — JSON_UNQUOTE returns LONGTEXT, which isn't directly indexable.

Attempt 2 — Add CAST to CHAR(255):

ALTER TABLE activity_log ADD INDEX email
  ((CAST(properties->>"$.request.email" AS CHAR(255))))
  USING BTREE;
-- Statement succeeds. No error, no warning.

EXPLAIN on the original WHERE clause:

EXPLAIN SELECT * FROM activity_log
  WHERE properties->>"$.request.email" = 'x@example.com';
-- type: ALL, key: NULL, possible_keys: NULL, rows: <all>

The index exists but isn't even a candidate. No error at CREATE INDEX time, no warning at EXPLAIN time, no hint in the documentation adjacent to CAST syntax. Silent performance collapse.

Attempt 3 — Explicit COLLATE utf8mb4_bin on the cast:

ALTER TABLE activity_log ADD INDEX email ((
  CAST(properties->>"$.request.email" AS CHAR(255))
    COLLATE utf8mb4_bin
)) USING BTREE;

EXPLAIN now reports possible_keys: email, key: email, key_len: 1023. Index used.

Why it happens

JSON_UNQUOTE is documented to return a string in the utf8mb4_bin collation — binary comparison, no case-folding, no Unicode normalisation. CAST(expr AS CHAR(N)) returns a string in the session's default collation, which under MySQL 8's utf8mb4_0900_ai_ci default is case-insensitive, accent-insensitive Unicode collation.

The expression tree for the indexed expression carries the collation annotation utf8mb4_0900_ai_ci; the expression tree for the predicate (which just uses ->> — i.e. JSON_UNQUOTE(JSON_EXTRACT(...)) — and doesn't cast) carries utf8mb4_bin. The two trees differ, so the optimizer's expression-matching step fails and no rewrite happens. The index sits there unused.

The fix (explicit COLLATE utf8mb4_bin inside the CAST) pins the indexed expression's collation to match what JSON_UNQUOTE returns — so the two expression trees now agree.

Diagnostic signature

  • EXPLAIN SELECT ... WHERE <indexed-expression> = ... reports: type: ALL, key: NULL, possible_keys: NULL.
  • SHOW INDEX FROM <table> confirms the index exists.
  • The CREATE INDEX statement did not fail or warn.
  • The source-text expression in the query literally matches the source-text expression inside the functional index.

When all four hold, collation mismatch is the first hypothesis to check. Secondary check: query information_schema.columns for the hidden generated column the functional index created and inspect its collation_name — if it's utf8mb4_0900_ai_ci and the source expression returns utf8mb4_bin, the mismatch is confirmed.

The named-generated-column form sidesteps the issue

A named generated column declared as VARCHAR(255) (not CHAR/CAST) inherits the collation from the column's declared type + the table default, and the subsequent ADD INDEX is on the named column, not on an expression — so collation matching isn't needed at all. This is a concrete argument for preferring named-column + index over the seemingly-terser functional index when indexing JSON extraction.

Generalisation beyond JSON

The failure mode generalises: any functional index whose expression returns an explicit collation (e.g. binary functions, CONVERT(... USING utf8mb4), or expressions that depend on a column declared with an explicit COLLATE) while the query predicate is cast to a different collation will hit the same silent-miss pattern.

Canonical guidance: when declaring a functional index over an expression with a non-default collation, match the collation explicitly in the index expression. Don't rely on CAST defaulting to the session connection collation.

Seen in

  • sources/2026-04-21-planetscale-indexing-json-in-mysql — canonical wiki introduction. Aaron Francis walks through the three-step failure-fix-silent-fail-final-fix progression on the JSON-extraction case and notes that "Unless otherwise specified, casting a value to a string sets the collation to utf8mb4_0900_ai_ci. The JSON extraction functions, on the other hand, return a string with a utf8mb4_bin collation. Therein lies our problem!"
Last updated · 550 distilled / 1,221 read