Skip to content

CONCEPT Cited by 2 sources

Index write cost

Every secondary index is a tax on writes. Adding an index speeds up reads that can use it and slows down every INSERT and UPDATE that touches an indexed column. Three distinct cost axes compound: storage, write latency, and planner overhead. This is the central trade-off that makes "add an index to every column" an anti-pattern — indexes are a scarce resource, not a free upgrade.

Canonical framing

Justin Gage names the three-part cost structure in one paragraph:

One thing to remember about indexes is that while they're beneficial for read queries (including JOINs and aggregates), they can negatively impact your database performance as a whole if you're not careful. Indexes take up a lot of space, even if that space is smaller than the initial table, and that space is precious; you'll need to keep an eye on how close you're inching towards your filesystem's limit. They also make INSERT queries take longer and force the query engine to consider more options before choosing how to execute a query.

sources/2026-04-21-planetscale-how-do-database-indexes-work

The three cost axes

1. Storage

Every secondary index is a separate B+tree that duplicates the indexed column(s) plus pointers to the base rows. For a 10-column 1-TB table:

  • 1 primary-key B+tree (the table itself, in a clustered engine)
  • N secondary-index B+trees, each holding (indexed column + PK pointer) per row

Even though each secondary index's per-row payload is smaller than the full row, the index tree depth is similar (same row count, same leaf capacity) so the aggregate storage footprint of N indexes often exceeds the base table once N ≥ 4–5. Gage's framing — "space is precious … filesystem's limit" — is load- bearing: indexes compete with data for disk and for page-cache capacity.

2. Write latency

Every INSERT writes to the clustered index plus every secondary index. Every UPDATE that changes an indexed column writes to the clustered index plus every index that covers that column. For N secondary indexes on a table:

  • INSERT cost: 1 + N B+tree inserts per row (ignoring buffer-pool locality).
  • UPDATE col = ... cost: 1 + (# indexes that include col) B+tree updates.
  • DELETE cost: 1 + N B+tree deletes.

Each B+tree write can trigger page splits, hot-page contention, and buffer-pool evictions. On high-write workloads (OLTP, append-heavy time-series), adding a fifth or sixth secondary index to an already-hot table often visibly regresses INSERT throughput. Gage's framing: "they also make INSERT queries take longer."

3. Planner overhead

The query optimiser considers every candidate index that could serve a given predicate. For each query:

  • Retrieve index statistics (cardinality, histogram) per candidate
  • Estimate rows-read / rows-returned per candidate
  • Choose the cheapest plan

Adding more indexes directly enlarges possible_keys (see concepts/mysql-explain's possible_keys column) and increases the planner's branching factor. In pathological cases the planner picks a suboptimal candidate (see " Why isn't MySQL using my index?" — the first_name vs state canonical example). Gage's framing: indexes "force the query engine to consider more options before choosing how to execute a query."

Canonical quantification (not given in the post)

Gage's cost framing is qualitative. The post does not quantify the N+1 multiplier, INSERT throughput regression per additional index, or the planner-overhead / plan-time cost at specific N. Downstream PlanetScale posts (Lien's What are the disadvantages of database indexes?) develop the write-cost discipline further — proposing invisible indexes as the safe-drop validation primitive when auditing unused indexes.

Mitigations

Three standard disciplines for keeping index write cost in check:

  • Audit unused indexes. Use patterns/cardinality-based-unused-index-detection and performance_schema.table_io_waits_summary_by_index_usage to find indexes with zero reads and drop them (or first mark invisible).
  • Prefer composite indexes over many single-column indexes. One three-column index on (a, b, c) satisfies queries filtering on a, (a, b), (a, b, c) via the leftmost-prefix rule — cheaper than three separate single-column indexes on the write path.
  • Make indexes covering only when profitable. A covering index widens the secondary-index leaves, further inflating write cost. Reserve for hot read paths whose projection is stable.

Seen in

Last updated · 550 distilled / 1,221 read