Skip to content

PLANETSCALE 2023-06-07

Read original ↗

PlanetScale — Using redundant conditions to unlock indexes in MySQL

Summary

Aaron Francis (PlanetScale, 2023-06-07) publishes a short pedagogical essay on the redundant-condition pattern as a portable SQL escape hatch for the index-obfuscation failure mode — the situation where a query filters on an indexed column but the column is wrapped in a function (YEAR(created_at), ADDTIME(due_date, due_time), etc.) and the planner can no longer use the index. The post distinguishes two flavours of redundant condition: a logically-redundant condition (the extra predicate cannot change the result set by construction — a broader range of the same column that necessarily contains every row the real predicate would select) and a domain-knowledge-redundant condition (the extra predicate could logically change the result set in general, but cannot in this application because of an invariant the database doesn't know about — e.g. created_at ≤ updated_at by application construction). Both flavours cost nothing beyond the query rewrite, require no schema change, and leave the result set unchanged. Where Francis's generated-hash-columns post uses a redundant condition to close a hash-collision safety hole, this post generalises redundant conditions into a first-class query-planner steering primitive whose canonical use case is the function-wrapped-column case.

(Source: sources/2026-04-21-planetscale-using-redundant-conditions-to-unlock-indexes-in-mysql.)

Key takeaways

  1. Index obfuscation is the root failure mode. Wrapping an indexed column in a function hides the indexed value from MySQL — "we're asking MySQL to do an index lookup on YEAR(created_at), which is not an index MySQL maintains. It is only maintaining the created_at index." The EXPLAIN signature is unambiguous: type flips from rangeALL, key blanks out, rows jumps from O(1) to the full table cardinality (39,746 in the post's example). (Source: sources/2026-04-21-planetscale-using-redundant-conditions-to-unlock-indexes-in-mysql.)

  2. First-choice fix: de-obfuscate via range rewrite. For WHERE YEAR(created_at) = 2023, the semantically equivalent range rewrite WHERE created_at BETWEEN '2023-01-01 00:00:00' AND '2023-12-31 23:59:59' restores type: range + key: created_at. Always prefer this when the function is invertible — single-column, deterministic, bijective on the interval you care about. (Source: sources/2026-04-21-planetscale-using-redundant-conditions-to-unlock-indexes-in-mysql.)

  3. Second-choice fix: redundant condition. When the function wraps multiple columns and has no single-column inverse, de-obfuscation fails. Canonical example from the post: ADDTIME(due_date, due_time) BETWEEN NOW() AND NOW() + INTERVAL 1 DAYdue_time varies per row, so there is no rewrite of the predicate alone that avoids the function call. Add a logically-redundant condition on due_date aloneAND due_date BETWEEN CURRENT_DATE AND CURRENT_DATE + INTERVAL 1 DAY — chosen to be broader than the true predicate so it cannot exclude a row the real predicate would include. The planner uses the due_date index to narrow the row set, then the expensive ADDTIME comparison runs on the few remaining candidates. EXPLAIN flips from ALL / 39,746 rows → range / due_date / 1 row / Using index condition; Using where. (Source: sources/2026-04-21-planetscale-using-redundant-conditions-to-unlock-indexes-in-mysql.)

  4. The broadness requirement is load-bearing. Francis is explicit: "we need to make sure that it's logically impossible to change the result set, which means our redundant condition should be broader than our actual condition." A narrower redundant condition would silently drop rows and corrupt the result set — the bug would manifest as occasional missing records, invisible in testing. Worked safety margin: 1-day real window → 2-day redundant window (today + tomorrow) to absorb midnight-crossing cases. (Source: sources/2026-04-21-planetscale-using-redundant-conditions-to-unlock-indexes-in-mysql.)

  5. Third flavour: domain-knowledge-redundant conditions. The final example leaves strict logical redundancy and uses application-level invariants the database can't prove. Table has only created_at indexed, query filters on unindexed updated_at: WHERE updated_at < '2023-01-01 00:00:00' does a full scan. Add AND created_at < '2023-01-01 00:00:00'. This is not logically redundant — a row updated before 2023-01-01 could have been created in 2024 in principle (time travel). But by application construction, created_at ≤ updated_at for every row ("a record cannot be modified before it's created"), so the redundant predicate is a true superset in practice. The index on created_at narrows the scan; updated_at then filters out false positives. This flavour is more powerful but more fragile: if the application invariant is ever violated (bulk backfill with a synthetic updated_at; migration from another system; clock skew; manual UPDATE created_at for GDPR compliance) the query silently returns wrong results. (Source: sources/2026-04-21-planetscale-using-redundant-conditions-to-unlock-indexes-in-mysql.)

  6. Redundant conditions are free. Unlike the alternative "add an index on updated_at" or "add a functional index on ADDTIME(due_date, due_time)", rewriting the query costs no schema change, no write amplification, no storage, no ALTER TABLE downtime. "You can modify the query or the application generating the query, and suddenly everything gets faster. This makes them useful for queries that are only sometimes run or where indexes can't be easily added to the main conditions." They are the right tool when the query is ad-hoc, when the table is too large for a cheap ALTER, or when adding another index is blocked by write-amplification budget.

Extracted systems

  • MySQL — the substrate whose planner is being steered. The post is MySQL-specific in EXPLAIN syntax but the core mechanism applies to any B+tree-based relational planner.
  • InnoDB — where the created_at / due_date secondary-index B+trees live and where the range scan actually executes.

Extracted concepts

  • Index obfuscation — new canonical wiki page; the root failure mode the post names. Wrapping an indexed column in a function hides the index from the planner. Industry-standard term: non- sargable predicate ("Search ARGument ABLE"). Francis uses the folk-framing "obfuscation" because it's more intuitive to developers who don't know the SQL-optimiser term of art.
  • Redundant condition — pre-existing concept (the hash-collision canonical) now extended at the top-of-Seen-in to cover the general function-wrapping case. The Francis 2023-06-07 post predates the Francis [2026-04-21-planetscale-generated-hash-columns-in-mysql|Generated hash columns in MySQL] post by ~2 years and is the root-of-framing for the redundant-condition vocabulary the later hash post reuses.
  • Domain- knowledge redundant condition — new canonical wiki page; the more-powerful-but-more-fragile variant where the predicate's redundancy depends on an application invariant, not on logic. The created_at ≤ updated_at case.
  • Secondary index — extended with the index-obfuscation hazard framing. Indexes exist but invisibly silently fail when the column is wrapped in a function.
  • MySQL EXPLAIN — the instrument used throughout. type / key / rows columns are the canonical obfuscation-detection signal.
  • MySQL access type — the rangeALL flip is the unambiguous symptom of index obfuscation.

Extracted patterns

  • De- obfuscate via redundant condition — new canonical wiki pattern. Three-part decision: (1) can you invert the function? rewrite to a range predicate; (2) if not, is there a single-column projection of the wrapped expression that bounds it? add a logically-redundant predicate on that column; (3) if not, is there a domain-knowledge invariant between an indexed column and the wrapped column? add a domain-knowledge-redundant predicate.
  • Composite index for multi-column AND-predicate — related but orthogonal: when you can add an index and the query filters on two columns via AND, this is the preferred fix over redundant-condition steering.

Operational numbers

  • Obfuscation signaturetype: ALL, key: NULL, rows: 39,746 on the post's example todos table.
  • De-obfuscation via range rewritetype: range, key: created_at, rows: 1, Extra: Using index condition.
  • De-obfuscation via redundant conditiontype: range, key: due_date, rows: 1, Extra: Using index condition; Using where. The Using where suffix is the signal that the non-redundant ADDTIME(...) predicate is still filtering after the index narrows the set.
  • Broadness margin — the post uses a 2-day due_date window (today + tomorrow) to cover a 1-day real (NOW()NOW() + INTERVAL 1 DAY) window — ~2× safety margin to absorb midnight-crossing.
  • Domain-knowledge redundant-condition efficacy — because created_at ≤ updated_at holds universally in the example app, the redundant predicate's false-positive rate depends only on the created_at vs updated_at distribution shape; for most apps where updates are clustered near creation, the index path cuts row count by orders of magnitude.

Caveats

  1. Pedagogy post, no production numbers. Francis uses EXPLAIN-shape-of-an-example-table to illustrate the mechanism; there are no p99 latency before/after measurements, no row-count reductions on real workloads, no query-plan-cost-estimator numbers. The wiki canonicalises the mechanism, not a performance claim.
  2. Aaron Francis 2023-era voice. This is part of Francis's MySQL-for-developers pedagogy corpus (predating his generated-hash-columns, how-do-indexes-work, and instant-deploy posts by 1–2 years). Voice is deliberately accessible; technical depth is moderate.
  3. Domain-knowledge-redundant conditions are fragile. Francis names the invariant ("a record cannot be modified before it's created") but does not walk the failure modes: backfill violations, migration-from-other-system, clock skew, manual admin updates. Wiki page concepts/domain-knowledge-redundant-condition adds the failure-mode discussion the source omits.
  4. Function-wrapping isn't the only obfuscation class. Implicit type casts (WHERE string_col = 123 — MySQL casts every string_col row to integer before comparison and ignores the index), OR predicates that split into unrelated indexes, and leading-wildcard LIKE 'xxx%' anchored differently are all obfuscation classes the post doesn't cover. Canonical detection is the same EXPLAIN type: ALL + key: NULL signature across all classes.
  5. Range-rewrite boundary bugs. Francis's BETWEEN '2023-01-01 00:00:00' AND '2023-12-31 23:59:59' is a year boundary where TIMESTAMP precision of 1s admits a half-open-interval-off-by-one bug (a record at 2023-12-31 23:59:59.500 in a fractional-seconds schema is excluded). Production-grade rewrite would use < '2024-01-01 00:00:00' upper-bound form.
  6. Redundant conditions don't compose with OR. The pattern requires the planner to treat the redundant predicate as an AND conjunction that narrows the index path. WHERE (complex_expression) OR (redundant_simple) does not steer the planner toward the index — the OR branch widens the set, not narrows it.
  7. The 2023 post predates MySQL 8.0.13 functional indexes for most production deployments by a version or two — Francis doesn't mention functional indexes as an alternative, but today a functional index on ADDTIME(due_date, due_time) is another valid fix for the post's second example. The post's generalisation "ad-hoc queries and can't-easily-add-index cases" is where redundant conditions remain strictly better than functional indexes because no schema change is needed.

Source

Last updated · 470 distilled / 1,213 read