Skip to content

CONCEPT Cited by 3 sources

Secondary index

A secondary index is a database index on a non-primary-key column (or column set). In a system with a clustered index like MySQL's InnoDB, a secondary index is itself a B+tree — keyed on the indexed column, with primary-key values stored in its leaves:

The key is the column(s) that the user selected the index to be built for. The values are the primary key of the associated row.

sources/2024-09-09-planetscale-b-trees-and-database-indexes

Two-step lookup

Every query that matches a secondary index does two B+tree walks in a clustered-index system:

  1. Walk the secondary B+tree with the predicate value to find matching primary key(s).
  2. Walk the clustered index (the table itself) with each primary key to fetch the full row.

Worked example from the article:

CREATE TABLE user (
  user_id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL,
  username VARCHAR(256) NOT NULL,
  email VARCHAR(256) NOT NULL,
  PRIMARY KEY (user_id)
);
CREATE INDEX email_index ON user(email);

-- Query:
SELECT username FROM user WHERE email = 'x@planetscale.com';

The query engine first walks email_index (a separate B+tree keyed on email with user_id values), then walks the primary-key B+tree with the user_id to fetch the row and project username. Two B+tree traversals per matching row.

Covering indexes avoid the second walk

A covering index is a secondary index that includes all columns the query projects, stored inside the secondary-index leaves. Queries satisfied entirely from the secondary index skip the clustered-index lookup entirely. The trade-off:

  • Read: faster — one B+tree walk instead of two.
  • Write: slower — every update to a covered column must update the secondary index.
  • Storage: larger — secondary index now holds more data.

InnoDB covering indexes are typically expressed as composite indexes: CREATE INDEX cover ON user(email, username).

Write amplification

Every INSERT / UPDATE of an indexed column writes to the clustered index and every secondary index. Each secondary index is a separate B+tree with its own insert path and split mechanics. Costs that grow with secondary-index count:

  • Insert I/O: each index walked + modified per row.
  • Buffer-pool pressure: each index's working set of hot pages competes for cache.
  • Lock contention: each index has its own locks.
  • Storage: each index is a separate B+tree.

Practical rule: secondary indexes aren't free — every one added is a tax on every write to that table.

Primary-key size cascades into secondary-index size

Because every secondary index leaf stores the primary-key value, the PK choice affects the size of every secondary index:

Primary key Leaf entry size (index + PK pointer)
8-byte BIGINT Indexed column + 8 bytes
16-byte UUID Indexed column + 16 bytes

For a table with 5 secondary indexes, switching the PK from BIGINT to UUID adds 8 bytes × 5 indexes × row count to total storage — before counting fan-out reductions from larger key sizes.

Seen in

Last updated · 319 distilled / 1,201 read