Skip to content

PATTERN Cited by 1 source

Cardinality-based unused-index detection

Intent

Identify candidate secondary indexes that are structurally unused — either empty, degenerate, or never populated with distinct values — by querying information_schema.STATISTICS for rows where cardinality = 0 OR cardinality IS NULL. First-pass filter for index audits without production telemetry.

Context

You have a MySQL database that has accumulated secondary indexes over time: some from initial schema design, some added during past performance incidents, some from speculative "just in case" additions. You suspect some are carrying unneeded write-amplification + storage cost (concepts/secondary-index), but you don't have production query-level telemetry to know which ones. You need a cheap static-time filter to generate a candidate drop list that you'll then validate individually.

Mechanism

Query information_schema.STATISTICS — the standard MySQL per-index metadata view populated from mysql.innodb_index_stats / per-engine stats — filtering for indexes whose recorded cardinality is zero or NULL:

SELECT table_name, index_name, non_unique, seq_in_index,
       column_name, collation, cardinality, sub_part,
       packed, index_type, comment, index_comment
FROM information_schema.STATISTICS
WHERE table_schema = 'your_database_name'
  AND index_name != 'PRIMARY'
  AND (cardinality IS NULL OR cardinality = 0)
ORDER BY table_name, index_name, seq_in_index;

(Canonical source: sources/2026-04-21-planetscale-what-are-the-disadvantages-of-database-indexes)

Rows returned are candidates for removal. For each candidate:

  1. Optionally confirm with SELECT COUNT(*) FROM table — if the table is non-empty, the zero-cardinality index is suspicious on its own.
  2. Flip candidate to INVISIBLE and measure production workload; this is the validation step.
  3. If invisible-state performance is fine over at least one full workload cycle, commit via DROP INDEX.
  4. If any query regresses, flip back to VISIBLE immediately.

Why it works (when it works)

information_schema.STATISTICS.cardinality stores the number of distinct indexed-column values seen by the storage engine's sampler (InnoDB samples innodb_stats_persistent_sample_pages random B+tree pages per table by default). A value of zero or NULL typically indicates one of:

  • The index was just created and never populated (rare in production).
  • The table is empty (usually a lookup table that was never used).
  • Every row shares the same value for the indexed column (degenerate — the index has no filtering power).
  • The stats sampler hasn't run or has been disabled.

In each case the index is carrying its write-amplification + storage cost with zero read benefit, so dropping it is safe even absent measured-usage telemetry.

When this pattern is insufficient

The cardinality-equals-zero heuristic misses the far more common production scenario: an index with high cardinality (distinct values aplenty) that no production query actually uses. Example: a CREATE INDEX email_idx ON user(email) that was added in 2019 for a feature that got deprecated in 2020 — cardinality is healthy (millions of distinct emails) but the index is dead weight. The cardinality heuristic is blind to this case by construction.

For this class of unused-but-high-cardinality index, operators need production-measured query-level telemetry: see Hazen 2024 Tracking index usage with Insights for PlanetScale Insights' implementation via a patched InnoDB index_init() hook populating a per-query used-index set, surfaced as time-series in the Insights Indexes tab. That telemetry is the measured-usage counterpart to this pattern's structural heuristic.

Consequences

Benefits

  • Zero cost — one SQL query against an in-memory metadata view; runs in milliseconds on any MySQL.
  • Standard MySQL — no schema changes, no observability infra, no third-party tools.
  • Safe by construction — returns a candidate list; the actual drop decision is still downstream.
  • Pre-observability first-pass — operators without production telemetry stacks can still get useful signal.

Costs / caveats

  • Lossy heuristic — an index with cardinality = 0 might be in active use on an empty table that is about to be bulk-loaded; dropping it would then require recreation. Verify table size before drop.
  • Misses unused-but-populated indexes (the common case). High-cardinality indexes never touched by production queries pass this filter. This pattern is necessary-but-not-sufficient for index auditing.
  • Stats-sampler dependency — if innodb_stats_persistent = OFF or the sampler hasn't run since a bulk load, cardinality can show zero for genuinely active indexes. Check ANALYZE TABLE recency before trusting the signal.
  • Does not measure usage. The name information_schema.STATISTICS.cardinality is not "usage count" — it's "distinct-value count." Operators who mistake it for a usage counter will drop actively-used indexes on high-cardinality but rarely-queried columns.
  • Destructive without invisibility validation. The candidate list is not an "automatically safe to drop" list — each candidate still needs the invisible-validate-drop cycle to confirm no production query relies on it.

Comparison to sibling patterns

Pattern Detection signal Strength Weakness
Cardinality-based (this pattern) information_schema.STATISTICS.cardinality = 0 Zero infra cost Misses populated-but-unused indexes
Handler-counter table_io_waits performance_schema.table_io_waits_summary_by_index_usage.count_star = 0 Measures actual access Table-granularity, SELECT-only
patterns/bidirectional-index-query-attribution / Insights Indexes tab Per-query used-index set via index_init() hook Per-query-pattern time-series PlanetScale-only; SELECT-only caveat

The three patterns are complementary altitudes, not mutually exclusive: cardinality filter is the cheap first-pass; performance_schema is the OSS-MySQL measured-usage signal; PlanetScale Insights is the production-telemetry canonical.

Seen in

  • sources/2026-04-21-planetscale-what-are-the-disadvantages-of-database-indexes — JD Lien (PlanetScale, 2023-02-17) canonicalises the information_schema.STATISTICS cardinality query as the detection half of the two-step safe-drop workflow paired with invisible-index validation. Verbatim: "This query checks the cardinality of each index, which is the number of unique values in the index. If this value is 0, then the index is not being used." Pragmatic first-pass filter for pre-observability operators; the post itself flags the lossiness by recommending invisibility validation before the actual drop.
Last updated · 470 distilled / 1,213 read