CONCEPT Cited by 1 source
Skewed column selectivity¶
Skewed column selectivity is the condition where a column's selectivity computed on average (across the whole table) is a poor predictor of selectivity for specific predicate values. An index on a skewed column can be highly useful for one range of values and useless for another — a structural pitfall in both index design and query-planner reasoning.
(Source: sources/2026-04-21-planetscale-why-isnt-mysql-using-my-index.)
The canonical worked example¶
Consider a table of one million users, where 99% are of
type = "user"and 1% are oftype = "admin". In this case, an index ontypemight seem useless because it's not very selective on average. But when you're querying for admins, it is highly selective. So while checking average selectivity is a good rule of thumb, pay careful attention to unevenly distributed data.— Aaron Francis, Why isn't MySQL using my index?
Running the numbers on a 1,000,000-row table with 2 distinct values:
- Average selectivity:
COUNT(DISTINCT type) / COUNT(*) = 2 / 1,000,000 = 0.000002. Essentially zero. - Selectivity for
type = 'admin': the index walk returns ~10,000 rows (1% of 1M) out of 1M → narrowing ratio 0.01. Highly useful for filtering. - Selectivity for
type = 'user': the index walk returns ~990,000 rows out of 1M → narrowing ratio 0.99. Worse than a table scan when you have to walk the index then walk the clustered index for each row.
Why the planner gets it wrong¶
The MySQL optimiser uses the stored index statistics (sampled cardinality) to estimate selectivity at plan time. The statistics are a single average number per index, not a histogram of value frequencies. So for a skewed column the optimiser either:
- Underuses the index (thinking it's not selective enough on average, ignoring the rare-value win).
- Overuses the index (thinking it's selective enough on average, losing on the common-value path).
Modern MySQL (≥8.0) supports
histogram statistics
via ANALYZE TABLE ... UPDATE HISTOGRAM ON which records a
value-frequency distribution — addressing skewed-column
planning specifically. Aaron Francis's 2023 post predates
widespread histogram adoption and does not cite the
mechanism.
Mitigation patterns¶
Partial index (Postgres; not native MySQL)¶
PostgreSQL supports CREATE INDEX ... WHERE type = 'admin'
— index only the rare values:
Storage-efficient (only 1% of rows indexed) and perfectly selective for admin lookups. MySQL has no native partial- index syntax; the workaround is a functional/generated- column trick.
Functional index on the narrowing condition (MySQL 8.0.13+)¶
-- Index only NOT NULL admin rows via a generated column
ALTER TABLE users
ADD COLUMN admin_marker TINYINT
GENERATED ALWAYS AS (CASE WHEN type = 'admin' THEN 1 END) STORED,
ADD INDEX (admin_marker);
Rows where type = 'user' get NULL in admin_marker and
(depending on MySQL version) are not indexed in the
secondary B+tree — approximating the partial-index
behaviour. See concepts/functional-index-mysql for the
full mechanism.
Histogram statistics (MySQL 8.0+)¶
Gives the optimiser a value-frequency distribution it can
use to distinguish type = 'admin' (rare, use index) from
type = 'user' (common, prefer scan). Independent of the
index — histograms work even on unindexed columns.
Separate the rare table¶
If the skew is severe and persistent, a vertical split into
users_admin and users_regular tables (with a union-view
for general queries) trades write-path complexity for
predictable query plans.
When to apply the caveat¶
Run a distribution query before trusting an index on a categorical column:
SELECT type, COUNT(*) AS cnt,
COUNT(*) / (SELECT COUNT(*) FROM users) AS fraction
FROM users
GROUP BY type
ORDER BY cnt DESC;
Any column where the top-N values cover >80-90% of rows
is a skew candidate. Feature flags, status enums
(active / deleted / pending), rare-event markers
(is_fraudulent, is_banned), tenant IDs in
multi-tenant tables where one tenant dominates.
Seen in¶
- sources/2026-04-21-planetscale-why-isnt-mysql-using-my-index — canonical framing: "Calculating selectivity across an entire table can be misleading if the data is not evenly distributed" with the 99%-user / 1%-admin worked example.
Related¶
- concepts/index-selectivity — the metric this caveat qualifies.
- concepts/index-cardinality — the value underneath.
- concepts/functional-index-mysql — the MySQL mitigation surface.
- concepts/secondary-index — the structure affected.