Skip to content

PLANETSCALE 2023-02-17

Read original ↗

PlanetScale — What are the disadvantages of database indexes?

Summary

JD Lien (PlanetScale, 2023-02-17) publishes a short pedagogical essay cataloguing the costs of secondary indexes to counter the naive reflex of "add an index to every column." The post names three first-order disadvantages (storage, write slowdown, query-planner complexity) and pairs them with a two-step operational workflow for safely removing indexes you may have over-provisioned: (1) detect candidate unused indexes via the information_schema.STATISTICS cardinality heuristic, (2) validate by flipping the candidate to INVISIBLE and measuring before finally DROP INDEX-ing. Canonical load-bearing claim: "Every index takes up additional storage, can slow down write operations, and can complicate the query optimizer's job, so they aren't always guaranteed to improve performance." Sits alongside Ben Dicken's [[sources/2024-09-09-planetscale-b-trees-and-database-indexes|2024-09-09 B-trees and database indexes]] (structural mechanism of write amplification) and Rafer Hazen's [[sources/2026-04-21-planetscale-tracking-index-usage-with-insights|2024-08-14 Tracking index usage with Insights]] (production-telemetry approach to the same detection problem) as the third leg of the PlanetScale index-audit corpus: JD Lien 2023 is the pedagogy-altitude trade-off catalogue, Dicken 2024 is the structural-mechanism canonicalisation, Hazen 2024 is the production-telemetry approach.

(Source: sources/2026-04-21-planetscale-what-are-the-disadvantages-of-database-indexes.)

Key takeaways

  1. Indexes cost storage, but usually a small percentage. "The exact amount of space depends on the size of the table and the number of columns in the index, but it's usually a small percentage of the total size of the table." A basic index stores the indexed-column values plus a pointer to the row; integer columns incur only the integer values, string columns incur the strings plus their length. The storage cost scales with row count × indexed-column width, multiplied per secondary index on the table. (Source: sources/2026-04-21-planetscale-what-are-the-disadvantages-of-database-indexes.)

  2. Indexes slow down writes proportionally to index count. "When you add an index, it has to be updated whenever a row is inserted, updated, or deleted. This means that writes will be slower." Every INSERT/UPDATE/DELETE on the base row must also walk and modify every secondary index's B+tree. This is the write-amplification tax canonicalised elsewhere on the wiki — JD Lien frames it as a practical decision axis: "Before you add an index, you should consider whether you will be doing a lot of writes to the table and whether or not you can afford to slow down the writes." (Source: sources/2026-04-21-planetscale-what-are-the-disadvantages-of-database-indexes.)

  3. Worked operational number: 1M-row bulk insert 10–15s → ~2 min after adding indexes. "In one application I worked on, doing a bulk insert of about a million records only took around 10-15 seconds without any indexes. Unfortunately, the performance of certain frequently used queries was quite slow, taking a few seconds to run and causing a bad user experience. Adding several indexes for such queries improved the performance significantly, but the bulk insert now takes closer to two minutes." The specific ~8–12× write slowdown on a bulk-insert path is the most concrete number in the post. Decision framing: acceptable because the bulk insert is off-peak + infrequent; "If something like this bulk insert was triggered by users who had to sit and wait for it, then it might be a different story." Write-amplification is a workload- specific trade-off, not a universal verdict. (Source: sources/2026-04-21-planetscale-what-are-the-disadvantages-of-database-indexes.)

  4. Query-planner complexity is the third cost. Indexes "can complicate the query optimizer's job, so they aren't always guaranteed to improve performance." With many indexes available, the planner must choose among them — and may pick suboptimally, especially when statistics are stale or cardinalities are skewed. The post names the cost but does not walk it mechanism-level (no Index Merge vs composite-index discussion, no statistics-freshness caveat, no planner-time measurement). Mentioned in passing; the detailed mechanics live on other wiki pages (see MySQL EXPLAIN, composite index for AND-predicate). (Source: sources/2026-04-21-planetscale-what-are-the-disadvantages-of-database-indexes.)

  5. Detection heuristic: information_schema.STATISTICS cardinality = 0. The operational workflow for auditing existing indexes: query information_schema.STATISTICS filtering index_name != 'PRIMARY' and cardinality IS NULL OR cardinality = 0. Canonical query verbatim:

SELECT table_name, index_name, non_unique, seq_in_index,
       column_name, collation, cardinality, sub_part,
       packed, index_type, comment, index_comment
FROM information_schema.STATISTICS
WHERE table_schema = 'your_database_name'
  AND index_name != 'PRIMARY'
  AND (cardinality IS NULL OR cardinality = 0)
ORDER BY table_name, index_name, seq_in_index;

Load-bearing claim: "This query checks the cardinality of each index, which is the number of unique values in the index. If this value is 0, then the index is not being used." Note: this is the zero-cardinality interpretation of information_schema.STATISTICS — which is actually the number of distinct values in the index, not a usage counter. An index whose column has 0 distinct values in practice means the index has no rows or all rows share the same value — a structural anti-pattern, not directly a usage signal. The mapping from "cardinality = 0" to "index is not being used" is a rough heuristic the post canonicalises as a pragmatic first-pass filter, not a measured-usage observability primitive. Operators seeking measured-usage telemetry need Hazen 2024's production-telemetry approach via the InnoDB index_init() hook + PlanetScale Insights' Indexes tab. (Source: sources/2026-04-21-planetscale-what-are-the-disadvantages-of-database-indexes.)

  1. Drop is irreversible; test via invisibility first. The canonical safe-drop workflow is to first flip the candidate index to INVISIBLE ("you can keep the index intact but essentially hide the index from MySQL so that queries do not use the index") and measure whether affected query performance degrades — the reversible proxy for a destructive DROP INDEX. Visibility flip:
ALTER TABLE your_table_name ALTER INDEX
  your_index_name INVISIBLE;   -- test-drop
ALTER TABLE your_table_name ALTER INDEX
  your_index_name VISIBLE;     -- revert

After successful measurement of the invisible window, actually drop via ALTER TABLE your_table_name DROP INDEX your_index_name;. "When doing this, ensure that you test the performance of any affected queries before and after removing the index to make sure that you are not inadvertently making things worse." Invisibility is a reversible prelude to irreversible drop — a rollback- friendly staging mechanism, not a permanent state. (Source: sources/2026-04-21-planetscale-what-are-the-disadvantages-of-database-indexes.)

  1. PlanetScale operational wrinkle: DDL goes through deploy requests. The post flags a production-workflow detail specific to PlanetScale: "we don't allow direct DDL on production branches, unless they have safe migrations disabled (not recommended). So, you'll have to go through the deploy request process to test using invisible indexes." Index-visibility changes are schema changes and therefore route through the deploy request workflow — with the redemption that PlanetScale's Revert feature lets operators "simply click the 'Revert' button if you decide you want to undo an altered or dropped index and it will be reverted near instantaneously." First-party disclosure that the invisible-index operational workflow integrates with PlanetScale's instant revert substrate. (Source: sources/2026-04-21-planetscale-what-are-the-disadvantages-of-database-indexes.)

Canonical disclosures

Three-cost framing of indexes

Cost Scale with Post's framing
Storage Rows × indexed-column width × per secondary index "usually a small percentage of the total size of the table"
Write amplification Row-modify QPS × secondary-index count "every index … has to be updated whenever a row is inserted, updated, or deleted"; 1M bulk insert 10–15s → ~2min
Query-planner complexity Index count × query count "can complicate the query optimizer's job"

Two-step safe-drop workflow

  1. Detect: information_schema.STATISTICS query filtered on cardinality = 0 OR cardinality IS NULL — heuristic candidate list.
  2. Validate: ALTER INDEX ... INVISIBLE → measure affected queries → ALTER INDEX ... VISIBLE to revert or DROP INDEX to commit.

See patterns/cardinality-based-unused-index-detection.

Caveats

  • Pedagogy voice, no benchmarks. The one operational number (1M-row bulk insert 10-15s → ~2min) is a single- anecdote, single-workload observation — no p50/p99, no schema disclosure, no index count or column types. Decisive framing but not quantitative.
  • cardinality = 0 heuristic is lossy. The information_schema.STATISTICS.cardinality column is the number of distinct indexed-column values, not a usage counter. An index with cardinality = 0 may be genuinely empty (zero rows in the table) or structurally degenerate (all rows share the same indexed value) — both are reasons to drop, but neither is directly "the index is not being used by production queries." An index with high cardinality may still be never-queried in production. This heuristic is a first-pass filter, not the signal Hazen 2024 canonicalises via the innodb_index_init_hook.
  • ALTER TABLE ... ALTER INDEX ... INVISIBLE syntax. The post's shown SQL ALTER TABLE your_table_name; ALTER INDEX your_index_name INVISIBLE; separates the two statements — the ALTER TABLE before the semicolon as shown in the post is a no-op and probably a formatting artefact; the working MySQL 8.0 syntax is ALTER TABLE your_table_name ALTER INDEX your_index_name INVISIBLE; (single statement).
  • Query-optimiser complexity cost unexpanded. Named as third cost but no mechanism depth: no Index Merge discussion, no statistics-freshness caveat, no optimizer_search_depth tuning, no measured planner-time cost. Operators looking for the mechanism here will need to cross-reference concepts/mysql-explain + MySQL planner internals.
  • Primary-index exclusion is load-bearing but unexplained. "You should always include a primary index on every table in your database. However, too many secondary indexes can begin to cause issues in some instances." The post's scope is explicitly secondary indexes — the PK is table structure, not audit-eligible. Not stated why primary indexes are exempt from the three costs framing (they aren't — PK is also a B+tree that takes storage and complicates the planner — but the write- amplification cost is different because the PK is the clustered-index row location in InnoDB).
  • Detection/validation decoupling invites orphaned candidates. The two-step workflow has no scheduler/reminder for the "has the candidate been invisible for long enough to validate?" step — invisible indexes forgotten in the invisible state still occupy storage and incur write-amplification. Leaving a candidate invisible indefinitely is a storage-cost + write- amp bug, not a safety interlock.
  • JD Lien is not a canonical-pedagogy byline on the wiki. This is Lien's first (and likely only) PlanetScale corpus ingest in this batch. Contrast the canonical database-internals voices (Ben Dicken's 10+ ingests, Brian Morrison II's 7 ingests, Shlomi Noach's 9 ingests); Lien's post is a general pedagogy-tier piece without the domain depth those bylines bring. The post clears the bar via durable trade-off framing and the operational cardinality+invisibility workflow, not via database-internals depth.

Source

Last updated · 470 distilled / 1,213 read