CONCEPT Cited by 3 sources
Secondary index¶
A secondary index is a database index on a non-primary-key column (or column set). In a system with a clustered index like MySQL's InnoDB, a secondary index is itself a B+tree — keyed on the indexed column, with primary-key values stored in its leaves:
The key is the column(s) that the user selected the index to be built for. The values are the primary key of the associated row.
— sources/2024-09-09-planetscale-b-trees-and-database-indexes
Two-step lookup¶
Every query that matches a secondary index does two B+tree walks in a clustered-index system:
- Walk the secondary B+tree with the predicate value to find matching primary key(s).
- Walk the clustered index (the table itself) with each primary key to fetch the full row.
Worked example from the article:
CREATE TABLE user (
user_id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL,
username VARCHAR(256) NOT NULL,
email VARCHAR(256) NOT NULL,
PRIMARY KEY (user_id)
);
CREATE INDEX email_index ON user(email);
-- Query:
SELECT username FROM user WHERE email = 'x@planetscale.com';
The query engine first walks email_index (a separate B+tree
keyed on email with user_id values), then walks the
primary-key B+tree with the user_id to fetch the row and
project username. Two B+tree traversals per matching row.
Covering indexes avoid the second walk¶
A covering index is a secondary index that includes all columns the query projects, stored inside the secondary-index leaves. Queries satisfied entirely from the secondary index skip the clustered-index lookup entirely. The trade-off:
- Read: faster — one B+tree walk instead of two.
- Write: slower — every update to a covered column must update the secondary index.
- Storage: larger — secondary index now holds more data.
InnoDB covering indexes are typically expressed as composite
indexes: CREATE INDEX cover ON user(email, username).
Write amplification¶
Every INSERT / UPDATE of an indexed column writes to the
clustered index and every secondary index. Each secondary
index is a separate B+tree with its own insert path and split
mechanics. Costs that grow with secondary-index count:
- Insert I/O: each index walked + modified per row.
- Buffer-pool pressure: each index's working set of hot pages competes for cache.
- Lock contention: each index has its own locks.
- Storage: each index is a separate B+tree.
Practical rule: secondary indexes aren't free — every one added is a tax on every write to that table.
Primary-key size cascades into secondary-index size¶
Because every secondary index leaf stores the primary-key value, the PK choice affects the size of every secondary index:
| Primary key | Leaf entry size (index + PK pointer) |
|---|---|
8-byte BIGINT |
Indexed column + 8 bytes |
| 16-byte UUID | Indexed column + 16 bytes |
For a table with 5 secondary indexes, switching the PK from
BIGINT to UUID adds 8 bytes × 5 indexes × row count to
total storage — before counting fan-out reductions from
larger key sizes.
Seen in¶
- sources/2024-09-09-planetscale-b-trees-and-database-indexes — secondary-index structure in InnoDB; two-lookup query pattern; index creation syntax; primary-key size cascade.
- systems/innodb — implementation substrate.
- sources/2026-04-21-planetscale-achieving-data-consistency-with-the-consistent-lookup-vindex
— Vitess's Lookup
Vindex tables are literally relational tables with a
secondary-index-like structure —
(name, id) → keyspace_idorphone → keyspace_idstored as a separate MySQL table sharded by the lookup column. Canonical wiki example of the secondary-index shape escaping the storage-engine and becoming a sharding-layer routing primitive. The Consistent Lookup Vindex mechanism exists to keep these cross-shard "secondary indexes" consistent with their backing user table without 2PC. - sources/2026-04-21-planetscale-ai-powered-postgres-index-suggestions — production instance of the wiki's canonical discipline that secondary-index write-amplification is a real cost that has to be earned back by read-side wins before recommending a new index. PlanetScale Insights' AI index-suggestion feature encodes this as three pre-filter gates (rows-read:returned ratio, ≥ 0.1% runtime share, minimum frequency) because "indexes incur storage, memory and write overhead." Uses HypoPG + hypothetical-index evaluation to cost candidate indexes against Postgres's own planner before paying the write tax of actually creating them.
Related¶
- concepts/b-plus-tree
- concepts/clustered-index
- concepts/vindex
- concepts/consistent-lookup-vindex
- concepts/hypothetical-index-evaluation
- concepts/llm-generated-database-changes
- concepts/index-candidate-filtering
- patterns/sequential-primary-key
- patterns/ordered-commit-without-2pc
- patterns/llm-plus-planner-validation
- patterns/workload-aware-llm-prompting
- concepts/uuid-primary-key-antipattern
- concepts/innodb-buffer-pool
- systems/innodb
- systems/mysql
- systems/postgresql
- systems/vitess
- systems/hypopg
- systems/planetscale-insights