Skip to content

CONCEPT Cited by 1 source

Optimizer index-expression rewrite

Optimizer index-expression rewrite is the query-planner capability of recognising that a WHERE predicate expression matches the expression underneath an indexed generated column or a functional index, and substituting the indexed-column reference for the raw expression so the index is actually used.

It's the load-bearing ergonomic property that makes indexed expressions viable in a real codebase: callers don't have to know the indexed column exists — they write the natural predicate, the optimizer rewrites it under the hood.

Why it matters

Without expression rewrite, an indexed derived value is leaky: every caller has to reference the generated column by name to hit the index. A query like

SELECT * FROM activity_log
  WHERE properties->>"$.request.email" = 'x@example.com';

would table-scan even though the DBA has added a generated email column + B-tree index. With expression rewrite, the planner recognises that properties->>"$.request.email" matches the generated column's GENERATED ALWAYS AS (...) expression and rewrites the query to

SELECT * FROM activity_log
  WHERE activity_log.email = 'x@example.com';

before execution. The raw-expression predicate now uses the index with no application change.

Worked example (MySQL)

From Aaron Francis's pedagogical PlanetScale post:

-- Schema
ALTER TABLE activity_log ADD COLUMN email VARCHAR(255)
  GENERATED ALWAYS AS (properties->>"$.request.email");
ALTER TABLE activity_log ADD INDEX email(email) USING BTREE;

-- Query written against the raw JSON extraction:
EXPLAIN SELECT * FROM activity_log
  WHERE properties->>"$.request.email" = 'little.bobby@tables.com';

EXPLAIN reports possible_keys: email, key: email, key_len: 768 — the index is used. SHOW WARNINGS immediately after reveals the rewrite:

Note 1003: /* select#1 */ select `activity_log`.`id` ...
  from `activity_log` where
  (`activity_log`.`email` = 'little.bobby@tables.com')

The optimizer substituted activity_log.email for the properties->>"$.request.email" expression. The longhand equivalent is rewritten too:

WHERE JSON_UNQUOTE(JSON_EXTRACT(properties, "$.request.email"))
      = 'little.bobby@tables.com'

also hits the index, via the same rewrite (Source: sources/2026-04-21-planetscale-indexing-json-in-mysql).

Why third-party-ORM cases benefit most

Aaron Francis's canonical framing: "This is especially useful if you're unable to control the access pattern because the query is being issued from a 3rd party package in your codebase." A reporting tool, ORM, or query builder that emits the raw JSON-extraction predicate still gets the index — no need to fork the library or translate queries at an application layer.

Expression-matching constraints

The rewrite requires the planner's internal expression tree to match byte-for-byte after parsing. Practical consequences:

  • Argument order matters. JSON_EXTRACT(col, path) and JSON_EXTRACT(col, path) match; any variation in argument spelling (literal quoting, whitespace inside function calls) can prevent match.
  • Equivalent but distinct expressions don't match. col + 1 - 2 and col - 1 are arithmetically equal but parse to different trees.
  • Collation must match. See concepts/functional-index-collation-mismatch for the JSON-specific failure mode: a CAST(... AS CHAR(255)) without COLLATE utf8mb4_bin parses identically to one with it, but the collation annotation on the expression differs and the rewrite silently fails.
  • Escape hatch: query the named column directly. If the rewrite is flaky, writing WHERE email = '...' sidesteps it entirely.

A named generated column is strictly safer than a functional index for this reason — the expression lives once in the schema, and application queries can bind to the column name instead of restating the expression.

Generalises beyond JSON

The PlanetScale post surfaces this as a JSON-indexing footnote, but the property is substrate-general. Any planner that matches expression signatures against indexed expressions can do this:

  • MySQL 8.x — generated columns, functional indexes.
  • Postgres — expression indexes (CREATE INDEX idx ON t ((lower(email)));) with matching WHERE lower(email) = '...'. Same optimizer behaviour.
  • SQL Server — persisted computed columns with WITH SCHEMABINDING; the optimizer matches predicates that compute the same expression.
  • Oracle — function-based indexes; same property.

Worth canonicalising as a cross-engine concept because every expression-index design in every RDBMS depends on it to be useful in practice.

Seen in

  • sources/2026-04-21-planetscale-indexing-json-in-mysql — canonical wiki introduction via Aaron Francis's worked example on JSON-extraction expressions. Demonstrates rewrite both through EXPLAIN (shows key: email) and through SHOW WARNINGS (reveals the rewritten query text) for both ->>-shorthand and JSON_UNQUOTE(JSON_EXTRACT(...)) longhand forms.
Last updated · 550 distilled / 1,221 read