Skip to content

PLANETSCALE 2024-01-25

Read original ↗

PlanetScale — Working with Geospatial Features in MySQL

Summary

PlanetScale's Savannah Longoria publishes a pedagogical walkthrough of MySQL's geospatial features — the GIS-flavoured subset of the storage engine that handles points, paths, and polygons. The article taxonomises real-world geographic features into Entities (discrete objects with boundaries: landmarks, roads, cities, points-of-interest) and Spaces (continuous areas: land-cover, elevation, soil, environmental data), then walks MySQL's seven spatial data types (GEOMETRY, POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION), the three on-wire / on-disk representations (Well-Known Text, Well-Known Binary, and MySQL's internal WKB+SRID format), and the spatial-function toolbox (ST_Distance, ST_Contains, ST_Intersects, ST_Area, ST_Buffer, ST_Union, etc.). The load-bearing architectural content is in the final two sections: MySQL 8's first-class spatial reference system (SRS) support and its spatial-indexing requirements. Pre-8.0, every spatial function operated on an infinite flat Cartesian plane (SRID 0) regardless of what the column's SRID declared — users had to hand-roll unit conversion and ellipsoid math for any geographic query that wanted real-world distances. 8.0 made SRID a computation-affecting attribute (projected SRS → Cartesian with distance units; geographic SRS → lat/lon on an ellipsoid with angular units; SRID 0 → default flat plane). Spatial indexing has two hard prerequisites: columns must be NOT NULL and must pin a single SRID across all rows. And historically, many spatial relationship functions only evaluated against the minimum bounding rectangle (MBR) of a geometry rather than its actual shape — a pre-filter optimisation that trades precision for speed and still leaks through as a correctness gotcha on some older code paths.

Key takeaways

  1. Real-world geographic features split into Entities and Spaces. "Entities [are] specific objects with defined boundaries and individual properties" (landmarks, roads, administrative areas, POIs). "Spaces [are] continuous areas defined by their location and characteristics" (land-cover, elevation, soil types, environmental data). The Entity/Space split maps cleanly onto MySQL's geometry types: Entities become POINT / LINESTRING / POLYGON; Spaces become POLYGON / MULTIPOLYGON or raster data managed outside the database.

  2. Seven spatial data types, single vs multi. Single- geometry: GEOMETRY (the abstract base class), POINT, LINESTRING, POLYGON. Multi-geometry: MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, plus the heterogeneous GEOMETRYCOLLECTION. A POLYGON must be closed ("must have at least three unique points, and the first and last point-pairs must be equal") and can carry interior rings as holes. MySQL "does NOT support empty GEOMETRYCOLLECTIONs except for the single GEOMETRYCOLLECTION object itself" — a quirk worth knowing for code paths that build collections incrementally.

  3. Three spatial data formats: WKT, WKB, internal.

  4. Well-Known Text (WKT): human-readable, POINT(-74.044514 40.689244) style. Uses keywords
    • coordinates + optional metadata.
  5. Well-Known Binary (WKB): compact binary, machine-parseable, "more efficient storage and transmission than WKT".
  6. MySQL internal format: WKB-like but with 4 extra bytes prepended for the SRID. "SRID defines the coordinate system of the geometry, ensuring accurate interpretation." The internal format is why a SELECT location FROM customers; on a MySQL POINT column returns a hex blob (needs ST_AsText() to get WKT back) — the canonical cross-engine-migration gotcha documented separately as concepts/mysql-spatial-binary-wkb-vs-postgres-point.

  7. Table schema: SRID declaration pins the coordinate system. Canonical create-table from the article:

    CREATE TABLE locations (
      id int NOT NULL,
      city varchar(255) NOT NULL,
      country varchar(255) NOT NULL,
      population int NOT NULL,
      g geometry NOT NULL SRID 4326,
      PRIMARY KEY (id),
      SPATIAL KEY g (g),
      FULLTEXT KEY city_ascii (city_ascii)
    );
    
    The NOT NULL SRID 4326 on column g is load-bearing: it's the only way to make SPATIAL KEY g (g) usable (see takeaway 8). SRID 4326 is WGS 84 lat/lon — the default for GPS-derived data.

  8. Spatial-function taxonomy. The article groups functions into six families:

  9. Location: ST_GeomFromText(wkt), ST_X(geom), ST_Y(geom) — convert between WKT and internal representation, extract scalar coordinates.
  10. Distance: ST_Distance(g1, g2) — planar; ST_Distance_Sphere(g1, g2) — haversine on a sphere.
  11. Area / perimeter: ST_Area(geom).
  12. Intersection / containment: ST_Contains(g1, g2), ST_Intersects(g1, g2).
  13. Buffering: ST_Buffer(geom, distance) — generates a polygon distance units around a geometry.
  14. Analysis: ST_Union(g1, g2), ST_Difference(g1, g2).
  15. Relationship: ST_Touches, ST_Crosses, ST_Overlaps — boolean predicates for the DE-9IM- style relationship matrix. Canonical query shapes from the post: SELECT * FROM restaurants WHERE ST_Distance(ST_GeomFromText('POINT(10 20)'), location) <= 1000; and SELECT SUM(ST_Area(geom)) FROM forests;.

  16. MySQL 8 made SRID computation-affecting; 5.x didn't. "Prior to MySQL 8.0, spatial features were stored with a spatial reference system identifier (SRID), but the database couldn't utilize this information for calculations. Instead, all functions operated on a flat plane (SRID 0). This meant users had to create custom functions to convert units and perform accurate calculations, requiring a deep understanding of math and geometry." 8.0 introduces three SRS categories:

SRS type Coordinates Units
Projected SRS Cartesian (x, y) metres, feet
Geographic SRS lat-lon on ellipsoid angular (degrees)
SRID 0 (default) infinite flat plane unitless

"There are more than 5,000 spatial reference systems to choose from." Picking the right one is a trade-off: projected SRS give real distance in metres but distort area/shape far from the projection centre; geographic SRS preserve geometry at the cost of requiring ellipsoid-aware distance math (haversine / Vincenty); SRID 0 is cheap and wrong for anything curvature-sensitive.

  1. Two hard prerequisites for spatial indexing. "Two requirements have to be met for spatial indexing to work properly:
  2. The geometry columns to be included in the index need to be defined as NOT NULL.
  3. Columns need to be restricted to a spatial reference system identifier (SRID), and all column values must have the same SRID."

The SRID constraint is the interesting one: it's the spatial-data cousin of the "partial index must match predicate" rule. MySQL's R-tree-style spatial index needs a fixed coordinate system to compute bounding rectangles; mixing SRIDs breaks the spatial ordering. A workload that ingests both WGS 84 GPS points and NAD 83-projected survey data in the same table cannot share one spatial index.

  1. Minimum bounding rectangle (MBR) is the pre-filter primitive. "Many spatial relationship functions only used the minimum bounding rectangle (MBR) instead of the object's actual shape, limiting their accuracy." The MBR is the smallest axis-aligned rectangle that contains a geometry. R-tree spatial indexes store MBRs because they're cheap to compare (4 numbers) vs an arbitrary polygon (N vertices). Every spatial query conceptually has two phases: Phase 1 (MBR pre-filter) — index descent returns rows whose MBRs overlap the query MBR; Phase 2 (exact geometry check) — filter survivors against the actual geometry. Historical MySQL MBRContains / MBRIntersects / etc. functions expose phase 1 directly; ST_Contains / ST_Intersects in modern MySQL run both phases.

  2. Haversine distance is ST_Distance_Sphere, not ST_Distance. The worked example does st_distance_sphere((select g from locations where city_ascii = 'Santos'), (select g from locations where city_ascii = 'Sao Paulo')) — two correlated subqueries pulling POINT values, then spherical great-circle distance in metres. Naive ST_Distance on WGS 84 lat/lon returns degrees squared, which is almost never the user's intent. The sphere approximation is accurate enough for most km-scale work; WGS-84-ellipsoid-exact math requires ST_Distance with a geographic SRS and projected units (MySQL 8+).

  3. Spatial index + ST_Distance_Sphere still works for radius queries. The post's <= 15000 metre radius query from New York or Santos illustrates the common pattern: a bounding-box pre-filter on the spatial index narrows candidates to an MBR around the query centre, then ST_Distance_Sphere filters survivors to the actual radius. The index alone can't do circle-containment (R-trees store rectangles) but pairs well with a tighter predicate.

Operational numbers

  • 7 spatial data types: GEOMETRY, POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, plus GEOMETRYCOLLECTION (8 counting the collection).
  • 3 spatial data formats: WKT (text), WKB (binary), MySQL-internal (WKB + 4-byte SRID prefix).
  • 3 SRS categories in MySQL 8: Projected, Geographic, SRID 0 (default).
  • >5,000 named spatial reference systems available in MySQL 8.
  • SRID 4326 (WGS 84 lat/lon) is the idiomatic choice for GPS / web-map data.
  • 2 prerequisites for spatial indexing: NOT NULL column + single-SRID constraint.
  • 4 bytes SRID prefix on MySQL's internal-format geometry values (vs bare WKB).

Caveats

  • Pedagogical post, no production numbers. No query latency curves, no index-size-vs-row-count data, no MBR-prefilter selectivity measurements, no spatial-vs-nonspatial-index comparisons. This is a reference primer, not a war-story retrospective.
  • PlanetScale product pitch embedded mid-post ("PlanetScale supports geospatial objects. If you'd like to follow along… sign up to spin a database cluster in seconds") — the article is hosted on a vendor blog; the MySQL content is independent of the pitch but worth flagging.
  • No coverage of PostGIS / Postgres spatial comparison. Real-world geospatial workloads often compare MySQL spatial vs PostGIS (richer function set, 3D support, raster, topology); the post is MySQL-only.
  • No mention of sharding / distributed spatial queries. PlanetScale's Vitess substrate shards MySQL; spatial queries that don't have a sharding-key predicate fan out across all shards, which is a known pain point. Not discussed here.
  • No coverage of WKB return-format gotcha on cross-engine migration. The post shows the internal format exists but doesn't walk the reader through the ST_AsText wrapper needed to recover text output.
  • MBR discussion is historical framing, not a current correctness warning for 8.0+. Modern ST_* functions do exact geometry checks; the MBR limitation described is for pre-8.0 code paths and for the MBR*-prefixed functions explicitly.
  • ST_Distance_Sphere is spherical, not ellipsoidal. Over intercontinental distances the Earth's flattening (≈0.3%) introduces metres-scale error. Flight-planning / geodesy workloads need ST_Distance with a geographic SRS (ellipsoid math) or an external library like GeographicLib.
  • No numbers on R-tree fan-out / node size / page size in MySQL's spatial index. Compared to the B-tree / InnoDB coverage elsewhere in PlanetScale's blog, the spatial-index internals are left unexplored.
  • Article predates MySQL 8.0's ST_FrechetDistance and other 8.0.x additions (post appears to be 2024-01 based on the author byline — the feed published: 2026-04-21 is a re-crawl artefact).

Source

Last updated · 470 distilled / 1,213 read