CONCEPT Cited by 1 source
Collation¶
A collation is a set of rules for comparing and sorting
strings. Every string character
set in MySQL has at least one collation,
and one of those is designated the default collation —
used when a charset is declared without an explicit
collation. Collations govern equality (=), ordering
(ORDER BY, <, >), pattern matching (LIKE), and
deduplication (DISTINCT, GROUP BY) semantics on string
data.
Definition¶
From PlanetScale's Aaron Francis: "While character sets define the legal characters that can be stored in a column, collations are rules that determine how string comparisons are made. If you are sorting or comparing strings, MySQL uses the collation to decide the order and whether the strings are the same." A single character set can have multiple collations (different comparison rules for the same character repertoire), but a collation belongs to exactly one character set. (Source: sources/2026-04-21-planetscale-character-sets-and-collations-in-mysql.)
The suffix taxonomy¶
MySQL collations follow a naming scheme where the charset forms a prefix and the suffix encodes the comparison axes:
| Suffix | Meaning |
|---|---|
_ai |
Accent-insensitive |
_as |
Accent-sensitive |
_ci |
Case-insensitive |
_cs |
Case-sensitive |
_ks |
Kana-sensitive (Japanese hiragana vs katakana distinction) |
_bin |
Binary (byte-for-byte comparison) |
These compose: utf8mb4_0900_ai_ci decomposes as
utf8mb4 (character set) + 0900 (Unicode
Collation Algorithm version 9.0.0 weight keys) + ai
(accent-insensitive) + ci (case-insensitive).
utf8mb4_0900_as_cs is the same UCA 9.0.0 weight keys but
accent- and case-sensitive.
String comparison sensitivity is a collation property¶
The answer to "are string comparisons case-sensitive in
MySQL?" is "it depends on the collation" — not on
SQL, not on the charset, not on some server-wide mode.
Demonstrated with the COLLATE cast operator:
-- Case-insensitive: "MySQL" = "mysql"
SELECT "MySQL" COLLATE utf8mb4_0900_ai_ci = "mysql" COLLATE utf8mb4_0900_ai_ci; -- 1
-- Case-sensitive: "MySQL" != "mysql"
SELECT "MySQL" COLLATE utf8mb4_0900_as_cs = "mysql" COLLATE utf8mb4_0900_as_cs; -- 0
Same strings, same characters, opposite answers, determined
entirely by collation choice. The same logic applies to
accents: under _ai collations, résumé = resume; under
_as, they are distinct.
The inheritance hierarchy¶
Collations follow the same four-level hierarchy as
character sets, with the
most-specific winning: column > table > database > server
default. If a collation is not explicitly declared at any
level, MySQL falls back to the default collation of the
character set at whichever level owns the charset
declaration. So a table declared DEFAULT CHARSET=utf8mb4
with no explicit COLLATE clause will use
utf8mb4_0900_ai_ci (the DEFAULT_COLLATE_NAME for
utf8mb4 on MySQL 8).
(Source: sources/2026-04-21-planetscale-character-sets-and-collations-in-mysql.)
Table-level override:
CREATE TABLE table_with_collation (
my_column VARCHAR(255)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
Column-level override + mixed inheritance:
CREATE TABLE table_with_collation (
`explicitly_set` VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
`implicitly_set` VARCHAR(255)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
-- explicitly_set: utf8mb4_general_ci (column override)
-- implicitly_set: utf8mb4_0900_ai_ci (table charset default collation)
Runtime COLLATE cast¶
Collations can be overridden per-query via the
COLLATE operator, not just in schema declarations. This
is useful for case-sensitive searches on case-insensitive
columns, or vice versa, without having to add a separate
column or change the schema:
UCA versions¶
Modern MySQL 8 collations are prefixed with UCA
(Unicode Collation Algorithm) version keys — 0900 refers
to UCA 9.0.0 weight keys
which govern the language-specific ordering and
equivalence rules. Upgrading UCA version changes the
definition of "equal" or "less than" for some code
points; MySQL 5.7's default utf8mb4_general_ci
pre-dates UCA-versioned collations and uses a simpler
(and in places wrong) language-agnostic rule.
Language-specific UCA collations exist alongside the
generic utf8mb4_0900_ai_ci — e.g. utf8mb4_cs_0900_ai_ci
(Czech), utf8mb4_da_0900_ai_cs (Danish, case-sensitive),
utf8mb4_zh_0900_as_cs (Chinese, accent + case sensitive).
These implement locale-accurate sort orders for
user-facing data.
Binary collation¶
utf8mb4_bin (and the older utf8mb4_bin variant with
PAD_ATTRIBUTE=PAD SPACE) compares strings byte-for-byte.
Useful when you want strict byte equality (e.g. storing
tokens, hashes, or case-sensitive identifiers where any
case-folding would be a bug) and want to opt out of
Unicode normalisation / case-folding / accent-folding
entirely.
Operational gotchas¶
JOINbetween columns with different collations fails withERROR 1267(Illegal mix of collations). This is MySQL refusing to guess which collation should govern the comparison. Fix:COLLATEcast one side, or align collations at schema-design time.GROUP BYdeduplicates under collation rules — under_ai_ci,GROUP BY namecollapsesMySQL,mysql, andMYSQLinto a single group. Under_bin, they are three distinct groups.- Index ordering is collation-aware — an index on a
column with
utf8mb4_0900_ai_cicannot be used to accelerate comparisons underutf8mb4_bin, and vice versa. Collation mismatch between query and index forces a full scan.
Seen in¶
- sources/2026-04-21-planetscale-character-sets-and-collations-in-mysql
— canonical wiki introduction of MySQL collations, the
_ai/_as/_ci/_cs/_ks/_binsuffix taxonomy, theCOLLATEcast operator, theutf8mb4_0900_ai_cidefault on MySQL 8 with UCA 9.0.0 weight keys, and demonstration that string-comparison sensitivity is a collation property rather than a SQL-standard property.