CONCEPT

JSON data type storage overhead¶

The storage-overhead caveat is the single largest reason to prefer classical relational columns over a MySQL JSON column when both shapes would work: the binary JSON encoding carries per-document metadata + a key-name dictionary on top of the raw payload, and the column as a whole has the query-time and indexing properties of LONGBLOB/LONGTEXT — which is to say, more buffer-pool pressure + larger result sets + no direct index.

The 4–10 bytes rule-of-thumb¶

From Mike Stojan:

[T]heir storage footprint is similar to the LONGBLOB or LONGTEXT data type. There is an overhead, though, due to the binary encoding and the added metadata and dictionaries which exist to speed up database reads. A good rule of thumb is that a string stored in JSON uses roughly 4 to 10 bytes of additional storage compared to a LONGBLOB or LONGTEXT column.

—

This is per-string overhead, not per-document — the dictionary grows with unique key-name cardinality, the metadata with depth + array length.

If storage efficiency is the goal, CHAR / VARCHAR / INT and their relatives are strictly cheaper:

If you want to optimize your database schema towards storage efficiency, it is best to go with MySQL's more traditional data types (CHAR, VARCHAR, INT, and alike), as they are all more storage- efficient than JSON can likely ever be.

—

Query-time consequence¶

JSON columns inherit the LONGBLOB/LONGTEXT buffer-economics profile: variable length, potentially large, and typically not small enough to keep hot in the InnoDB buffer pool at the same density as fixed-width rows. A query against a JSON column usually pulls more bytes into the buffer pool per row than a query against classical columns covering the same logical fields.

No direct indexing¶

A JSON column cannot be the target of a secondary index directly:

JSON documents cannot be indexed directly. This, and the variable amount of data you can store in a JSON document, means that querying a JSON column often uses more buffer space and returns larger result sets, leading to more data exchange.

—

The pragmatic path is to index parts of the document:

Named generated column over the extracted key, then a normal index on the column (MySQL 5.7+).
Functional index directly on the extraction expression (MySQL 8.0.13+).
Multi-valued index over array elements under one key (MySQL 8.0.17+).

Canonicalised as patterns/index-json-key-via-generated-column in the companion ingest.

When the overhead is worth paying¶

The access pattern is "fetch the whole document" — buffer-pool inefficiency doesn't matter because readers always want the full value.
The schema shape is variable enough that the alternative is a sparsely-populated wide table with many NULL columns (which also wastes storage).
A REST API / log-line / config blob is already JSON; parsing-and-re-normalising on every write costs more than the storage overhead.
Aggregation / analytics is out of scope — the value is read as a unit and deserialised in the application.

When it isn't¶

The fields are well-known and queried individually.
Row-width + buffer-pool density matter for InnoDB performance.
The data will drive joins, aggregates, or predicate-heavy analytical queries — the functional- index path handles one key per index, not the whole document.
Multi-key predicate queries would need a Postgres- style GIN over the document (not available in MySQL, see concepts/postgres-jsonb-vs-mysql-json).

Seen in¶

— canonical 4–10 bytes/string rule-of-thumb + no- direct-indexing caveat + PlanetScale's 67 MB per- document ceiling + "only a few MB" practical recommendation.
— Aaron Francis's canonical workaround: index parts of the document via generated column or functional index.