CONCEPT

Index selectivity¶

Selectivity is the fraction of a table's rows that a query predicate against an indexed column is expected to match — expressed as COUNT(DISTINCT col) / COUNT(*), a number between 0.0000 and 1.0000. The higher the ratio, the more effectively the index narrows the row set, and the more attractive it is to the query optimiser when multiple indexes could satisfy the same query.

(Source: .)

Formula and intuition¶

SELECT
  COUNT(DISTINCT first_name) / COUNT(*) AS first_name,
  COUNT(DISTINCT state)      / COUNT(*) AS state
FROM people;

Aaron Francis's worked numbers on a 491,583-row people table:

column	selectivity
`id`	1.0000
`first_name`	0.0060
`state`	0.0000 (rounds down from ~4 × 10⁻⁶)

A unique index is perfectly selective by definition: COUNT(DISTINCT id) = COUNT(*), so the ratio is 1.0000.

Why it matters to the planner¶

When two or more indexes could answer a query, the MySQL optimiser picks the more selective one:

the optimizer has decided that the first_name index is the best choice because the first_name index is more selective.

— Aaron Francis, Why isn't MySQL using my index?

The reasoning is mechanical: more selective index → fewer matching rows found in the B+tree walk → fewer clustered- index walks to fetch full rows in the two-step secondary-index lookup. Francis's worked EXPLAIN on WHERE first_name = 'Aaron' AND state = 'TX' picks the first_name index: possible_keys: first_name, state, key: first_name, rows: 180. The state index with 2 distinct values is structurally a worse choice: rows: ~246,944 estimated when forced via USE INDEX (state) — a ~1,370× regression.

Selectivity vs cardinality¶

Selectivity and cardinality are related but distinct:

Cardinality is the count of distinct values — 3,009 distinct first_names on the people table.
Selectivity is the ratio — 3,009 / 491,583 = 0.0060.

Cardinality alone is misleading without knowing the table size. "Knowing that there are 3,009 distinct values in the first_name column is interesting, but we have no idea if that's a relatively high or low number compared to the whole table!" The MySQL optimiser's persistent statistics store cardinality (see concepts/mysql-index-statistics) and derive selectivity from it at query-plan time.

The skew caveat¶

Average selectivity is a rule of thumb, not a guarantee. Skewed data breaks the model: a type column with 99% "user" + 1% "admin" has average selectivity of ~2 distinct / N rows ≈ 0, but is highly selective for admin queries (matching 1% of rows) and useless for user queries (matching 99%). Francis canonicalises the caveat verbatim:

while checking average selectivity is a good rule of thumb, pay careful attention to unevenly distributed data.

— Aaron Francis, Why isn't MySQL using my index?

Mitigation: partial indexes (Postgres), functional indexes that narrow to the rare value, or tolerate that the index is only useful for the infrequent value.

Seen in¶

— root-of-framing post: formula, worked 491,583-row example, optimiser-picks-more-selective rule, 1,370× regression when forcing wrong index, skew caveat.

concepts/index-cardinality — the count underneath the ratio.
concepts/mysql-index-statistics — the sampled cardinality the planner actually uses.
concepts/skewed-column-selectivity — when average selectivity lies.
concepts/secondary-index — the two-step lookup cost selectivity minimises.
concepts/composite-index — selectivity of the leading column dominates composite-index utility.
concepts/mysql-explain — the diagnostic surface that exposes which index the planner picked.