CONCEPT Cited by 2 sources
JSON data type storage overhead¶
The storage-overhead caveat is the single largest
reason to prefer classical relational columns over a
MySQL JSON column when both shapes would work: the
binary JSON encoding carries per-document metadata +
a key-name dictionary on top of the raw payload, and
the column as a whole has the query-time and indexing
properties of LONGBLOB/LONGTEXT — which is to say,
more buffer-pool pressure + larger result sets + no
direct index.
The 4–10 bytes rule-of-thumb¶
From Mike Stojan:
[T]heir storage footprint is similar to the
LONGBLOBorLONGTEXTdata type. There is an overhead, though, due to the binary encoding and the added metadata and dictionaries which exist to speed up database reads. A good rule of thumb is that a string stored in JSON uses roughly 4 to 10 bytes of additional storage compared to aLONGBLOBorLONGTEXTcolumn.
This is per-string overhead, not per-document — the dictionary grows with unique key-name cardinality, the metadata with depth + array length.
If storage efficiency is the goal, CHAR / VARCHAR /
INT and their relatives are strictly cheaper:
If you want to optimize your database schema towards storage efficiency, it is best to go with MySQL's more traditional data types (
CHAR,VARCHAR,INT, and alike), as they are all more storage- efficient than JSON can likely ever be.
Query-time consequence¶
JSON columns inherit the LONGBLOB/LONGTEXT
buffer-economics profile: variable length, potentially
large, and typically not small enough to keep
hot in the InnoDB buffer
pool at the same density as fixed-width rows. A query
against a JSON column usually pulls more bytes into
the buffer pool per row than a query against classical
columns covering the same logical fields.
No direct indexing¶
A JSON column cannot be the target of a
secondary index directly:
JSON documents cannot be indexed directly. This, and the variable amount of data you can store in a JSON document, means that querying a JSON column often uses more buffer space and returns larger result sets, leading to more data exchange.
The pragmatic path is to index parts of the document:
- Named generated column over the extracted key, then a normal index on the column (MySQL 5.7+).
- Functional index directly on the extraction expression (MySQL 8.0.13+).
- Multi-valued index over array elements under one key (MySQL 8.0.17+).
Canonicalised as patterns/index-json-key-via-generated-column in the companion Indexing JSON in MySQL ingest.
When the overhead is worth paying¶
- The access pattern is "fetch the whole document" — buffer-pool inefficiency doesn't matter because readers always want the full value.
- The schema shape is variable enough that the
alternative is a sparsely-populated wide table with
many
NULLcolumns (which also wastes storage). - A REST API / log-line / config blob is already JSON; parsing-and-re-normalising on every write costs more than the storage overhead.
- Aggregation / analytics is out of scope — the value is read as a unit and deserialised in the application.
When it isn't¶
- The fields are well-known and queried individually.
- Row-width + buffer-pool density matter for InnoDB performance.
- The data will drive joins, aggregates, or predicate-heavy analytical queries — the functional- index path handles one key per index, not the whole document.
- Multi-key predicate queries would need a Postgres- style GIN over the document (not available in MySQL, see concepts/postgres-jsonb-vs-mysql-json).
Seen in¶
- sources/2026-04-21-planetscale-the-mysql-json-data-type — canonical 4–10 bytes/string rule-of-thumb + no- direct-indexing caveat + PlanetScale's 67 MB per- document ceiling + "only a few MB" practical recommendation.
- sources/2026-04-21-planetscale-indexing-json-in-mysql — Aaron Francis's canonical workaround: index parts of the document via generated column or functional index.