CONCEPT Cited by 2 sources
Secondary index¶
A secondary index is a database index on a non-primary-key column (or column set). In a system with a clustered index like MySQL's InnoDB, a secondary index is itself a B+tree — keyed on the indexed column, with primary-key values stored in its leaves:
The key is the column(s) that the user selected the index to be built for. The values are the primary key of the associated row.
—
Two-step lookup¶
Every query that matches a secondary index does two B+tree walks in a clustered-index system:
- Walk the secondary B+tree with the predicate value to find matching primary key(s).
- Walk the clustered index (the table itself) with each primary key to fetch the full row.
Worked example from the article:
CREATE TABLE user (
user_id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL,
username VARCHAR(256) NOT NULL,
email VARCHAR(256) NOT NULL,
PRIMARY KEY (user_id)
);
CREATE INDEX email_index ON user(email);
-- Query:
SELECT username FROM user WHERE email = 'x@planetscale.com';
The query engine first walks email_index (a separate B+tree
keyed on email with user_id values), then walks the
primary-key B+tree with the user_id to fetch the row and
project username. Two B+tree traversals per matching row.
Covering indexes avoid the second walk¶
A covering index is a secondary index that includes all columns the query projects, stored inside the secondary-index leaves. Queries satisfied entirely from the secondary index skip the clustered-index lookup entirely. The trade-off:
- Read: faster — one B+tree walk instead of two.
- Write: slower — every update to a covered column must update the secondary index.
- Storage: larger — secondary index now holds more data.
InnoDB covering indexes are typically expressed as composite
indexes: CREATE INDEX cover ON user(email, username).
Write amplification¶
Every INSERT / UPDATE of an indexed column writes to the
clustered index and every secondary index. Each secondary
index is a separate B+tree with its own insert path and split
mechanics. Costs that grow with secondary-index count:
- Insert I/O: each index walked + modified per row.
- Buffer-pool pressure: each index's working set of hot pages competes for cache.
- Lock contention: each index has its own locks.
- Storage: each index is a separate B+tree.
Practical rule: secondary indexes aren't free — every one added is a tax on every write to that table.
Primary-key size cascades into secondary-index size¶
Because every secondary index leaf stores the primary-key value, the PK choice affects the size of every secondary index:
| Primary key | Leaf entry size (index + PK pointer) |
|---|---|
8-byte BIGINT |
Indexed column + 8 bytes |
| 16-byte UUID | Indexed column + 16 bytes |
For a table with 5 secondary indexes, switching the PK from
BIGINT to UUID adds 8 bytes × 5 indexes × row count to
total storage — before counting fan-out reductions from
larger key sizes.
Seen in¶
- — Justin Gage (PlanetScale, 2022-07-14) canonicalises the
full mental model of secondary indexes from first
principles at pedagogy-101 altitude: indexes as
"sort-by-additional-column" (because a table can
physically be sorted by only one column, the PK, indexes
buy binary-search efficiency on other filter columns),
the "pseudo-table with two columns" mental model
(indexed value + PK pointer), the
heap-vs-clustered
taxonomic split (MS SQL / Azure SQL Database even expose
it as a toggle), the MySQL-vs-Postgres PK-storage
divergence ("In MySQL, primary keys are stored with
their data … In Postgres, primary keys are treated like
other indexes"), the
covering-index / index-only-
scan optimisation mapped across both engines, and the
three-part write-side cost
discipline (storage / write latency / planner overhead).
Pointer size quantified: <5 bytes in MySQL secondary
indexes. Worked complexity numbers: linear scan O(N/2)
→ B-tree scan O log(N); worked example: 300K-row table,
5–6-sec unindexed scan → sub-second indexed scan. Full
MySQL index-type taxonomy canonicalised — non-unique
(
KEY),UNIQUE,FULLTEXT(with threeMATCH() AGAINST()modes: natural-language / boolean / query-expansion), plus prefix / descending as footnotes. Composite-index column limit: 16. Operational introspection canonicalised:SHOW INDEX FROM table(enumerate existing indexes) +EXPLAIN(verify index use viapossible_keys/key) → new patterns/explain-driven-index-verification pattern. Aaron-Francis-style companion to Dicken's 2024-09-09 B-trees post — Gage (2022) teaches what indexes are and how to use them; Dicken (2024) teaches how B+trees actually store them. Together they are the canonical PlanetScale pedagogy for database indexes on the wiki. - — JD Lien (PlanetScale, 2023-02-17) canonicalises the
pedagogy-altitude trade-off catalogue for secondary
indexes: three first-order costs (storage, write
amplification, query-planner complexity) paired with a
two-step safe-drop workflow — (1) detect candidates
via the
information_schema.STATISTICScardinality heuristic and (2) validate each candidate viaALTER INDEX ... INVISIBLEbefore the destructiveDROP INDEX. Canonical pedagogical anchor "Every index takes up additional storage, can slow down write operations, and can complicate the query optimizer's job, so they aren't always guaranteed to improve performance." Worked operational number: a 1M-row bulk insert took 10-15s without indexes → ~2 minutes after adding several — the ~8-12× write-amplification tax quantified on a single production path. Composes with Dicken's 2024-09-09 (structural mechanism of write amplification) and Hazen's 2024-08-14
(production-telemetry approach) as the third altitude of the PlanetScale index-audit corpus — JD Lien is the pedagogy-altitude trade-off catalogue, Dicken is the structural-mechanism canon, Hazen is the production-telemetry canon.
- — Savannah Longoria canonicalises
EXPLAIN'skey_lencolumn as the only way to detect whether a composite secondary index is being used in full or only as a prefix — the column names the chosen index inkey, butkey_len(bytes) is the diagnostic that reveals whether later columns of the composite are binding the lookup or just post-filtering. The composite secondary index is canonicalised as a distinct shape from single-column secondary indexes via the(last_name, first_name)worked example (rows: 299,202 → 1); the post also names "two separate indexes" as the standard anti-pattern when a composite is needed. Canonical source for the composite-index for AND-predicate pattern — a composite secondary index is usually the right answer over Index Merge forWHERE A=x AND B=yworkloads. - — Aaron Francis canonicalises the
generated-hash-column pattern: an indexed
BINARY(16)MD5 column is a compact fixed-width secondary index replacing a would-be-wide index on aTEXTcolumn (which can only be prefix-indexed and therefore can't support strict equality). The composite variant (patterns/composite-hash-uniqueness-constraint) extends the shape: one 16-byte hash column with a singleUNIQUE INDEXreplaces a multi-column composite index as the multi-column uniqueness primitive. Canonical instance of the index-compactness lever — narrow index key = shallower B+tree = smaller hot working set = cheaper equality lookups, trading off any range/prefix/sort capability the original composite index would have provided. - — secondary-index structure in InnoDB; two-lookup query pattern; index creation syntax; primary-key size cascade.
- systems/innodb — implementation substrate.
- — Vitess's Lookup
Vindex tables are literally relational tables with a
secondary-index-like structure —
(name, id) → keyspace_idorphone → keyspace_idstored as a separate MySQL table sharded by the lookup column. Canonical wiki example of the secondary-index shape escaping the storage-engine and becoming a sharding-layer routing primitive. The Consistent Lookup Vindex mechanism exists to keep these cross-shard "secondary indexes" consistent with their backing user table without 2PC. - sources/2026-04-21-planetscale-ai-powered-postgres-index-suggestions — production instance of the wiki's canonical discipline that secondary-index write-amplification is a real cost that has to be earned back by read-side wins before recommending a new index. PlanetScale Insights' AI index-suggestion feature encodes this as three pre-filter gates (rows-read:returned ratio, ≥ 0.1% runtime share, minimum frequency) because "indexes incur storage, memory and write overhead." Uses HypoPG + hypothetical-index evaluation to cost candidate indexes against Postgres's own planner before paying the write tax of actually creating them.
- — Rafer Hazen (2024-08-14) canonicalises the
production-observability counterpart to the
static-time secondary-index decision: a per-query-
pattern time-series surface
(concepts/index-usage-time-series) showing which
indexes are actually touched by the production
workload, over time, in
PlanetScale Insights'
new
Indexestab. Captured via a patched InnoDBindex_init()hook populating a per-query used-index set that rides on the final MySQL wire-protocol packet back to VTGate (patterns/handler-hook-sidecar-telemetry). Productises both directions of index attribution (patterns/bidirectional-index-query-attribution): per-query → indexes (the tab) + per-index → queries (theindex:table.indexpredicate). Canonical drop-index signal: a secondary index showing 0% Insights hits across a week is a candidate for removal — but theSELECT-only caveat means operators must independently verify againstUPDATE/DELETEpaths (viaEXPLAIN+performance_schema.table_io_waits_summary_by_index_usage) before acting. Canonical shift from the 2024-09-09 Dicken B-trees-altitude write-amplification framing (the structural cost of a secondary index) to the 2024-08-14 Hazen production-observability altitude (the measured return the workload gets from it) — the same decision surface, at two different altitudes.
Related¶
- concepts/b-plus-tree
- concepts/clustered-index
- concepts/composite-index
- concepts/mysql-explain
- concepts/mysql-access-type
- concepts/vindex
- concepts/consistent-lookup-vindex
- concepts/hypothetical-index-evaluation
- concepts/llm-generated-database-changes
- concepts/index-candidate-filtering
- concepts/offset-pagination-cost
- concepts/deferred-join
- concepts/cursor-pagination
- patterns/sequential-primary-key
- patterns/composite-index-for-and-predicate
- patterns/ordered-commit-without-2pc
- patterns/llm-plus-planner-validation
- patterns/workload-aware-llm-prompting
- patterns/deferred-join-for-offset-pagination
- concepts/uuid-primary-key-antipattern
- concepts/innodb-buffer-pool
- concepts/mysql-invisible-index
- systems/innodb
- systems/mysql
- systems/postgresql
- systems/vitess
- systems/hypopg
- systems/planetscale-insights
- systems/fastpage-gem
Seen in — two-step lookup as deferred-join mechanism¶
- — Mike Coutermarsh's FastPage post. The secondary
index's two-step lookup property (walk the index
B+tree to find primary keys, then walk the
clustered index to
hydrate full rows) is exactly the mechanism the
deferred-join optimisation
exploits. The rewrite forces the first query to run
index-only (primary keys from the secondary-index
leaves, no clustered-index descent) and reserves the
full-row hydration cost for only the
Nprimary keys that actually survive theLIMITfilter — collapsing the hydration count fromM + NtoN. Quantified gain on a 1M-row RailsPosttable at offset 100: 1,228.7 ms → 457.3 ms (2.7×). Published in thefast_pageRuby gem (planetscale/fast_page). Cross-references High Performance MySQL for the pedagogical source: "It lets the server examine as little data as possible in an index without accessing rows."
Seen in — cost-driven externalisation at petabyte scale¶
- sources/2024-08-01-segment-0-6m-year-savings-by-using-s3-for-change-data-capture-for-dynamodb — Secondary index as a cost-bound query surface that can be priced out of the base store at petabyte scale. Twilio Segment's objects pipeline needed a secondary index on (item-id, modified-timestamp) for warehouse-integration consumers of a ~1 PB / ~958B-item DynamoDB table, and explicitly rejected the in-database answer (DynamoDB Global Secondary Index) on cost grounds. Verbatim: "Ideally, we could have easily achieved this using DynamoDB's Global Secondary Index ... But due to the very large size of our table, creating a GSI for the table is not cost-efficient." Canonical wiki instance of the economic sign-flip: below some threshold, the in-database secondary index is operationally free; above it, the index must be externalised. Segment materialised the index as a CDC changelog in Bigtable (V1) then S3 (V2) — see concepts/changelog-as-secondary-index and patterns/object-store-as-cdc-log-store.