Skip to content

PLANETSCALE 2022-03-29

Read original ↗

PlanetScale — Why we chose NanoIDs for PlanetScale's API

Summary

Mike Coutermarsh (PlanetScale, 2022-03-29, re-fetched 2026-04-21) publishes the first-party canonical disclosure of why PlanetScale's API uses NanoIDs rather than UUIDs as externally-visible identifiers. The post is the forward-pointer article that Brian Morrison II's later 2024-03-19 UUID-PK post defers to with "(NanoIDs, which we use at PlanetScale)" — this is where the choice is motivated and the mechanism disclosed. Four first-order contributions: (1) four explicit API-identifier requirements that led PlanetScale away from UUIDs: shorter than a UUID, double-click-selectable (hyphens break browser word-selection — 7cb776c5-8c12-4b1a-84aa-9941b815d873 selects as 5 separate words), low collision probability, and cross-language generation (Ruby + Go on PlanetScale's backend). (2) Specific NanoID parameterisation at PlanetScale: 12 characters drawn from the alphabet 0123456789abcdefghijklmnopqrstuvwxyz (base-36, no hyphens/underscores — narrower than NanoID's default 64-char URL-safe alphabet by choice). Canonical collision budget: "1% probability of a collision in the next ~35 years if we are generating 1,000 IDs per hour" — calculated via the NanoID collision tool. (3) Canonical public_id-alongside-BIGINT-PK pattern with full SQL + Rails code: id BIGINT AUTO_INCREMENT PRIMARY KEY clustered PK + public_id VARCHAR(12) with UNIQUE KEY idx_public_id (public_id) as secondary index. (4) Full production Ruby + Go code for the generator: Rails PublicIdGenerator ActiveSupport concern with before_create :set_public_id callback + MAX_RETRY = 1000 retry loop on uniqueness collision + PUBLIC_ID_REGEX format validator; companion publicid Go package using github.com/matoous/go-nanoid/v2. Sample values (12-char lowercase alphanumeric): izkpm55j334u, z2n60bhrj7e8, qoucu12dag1x. Post frames itself in the developer-experience altitude: "It may seem minor, but to build a product that developers love to use, we need to care about details like these."

Key takeaways

  1. Four explicit NanoID-vs-UUID selection criteria — PlanetScale set four requirements before picking the identifier format: "Shorter than a UUID. Easy to select with double clicking. Low chance of collisions. Easy to generate in multiple programming languages (we use Ruby and Go on our backend)." The double-click-selectability axis is the novel design signal — it's a pure UX property of how browsers tokenise strings that a-z/0-9 satisfies but UUID's hyphens break. Canonical anchoring of this property as a legitimate design constraint on API identifier format. (Source: this post)

  2. Double-click word-selection is broken by UUID hyphens"Try double clicking on that ID to select and copy it. You can't. The browser interprets it as 5 different words." (7cb776c5-8c12-4b1a-84aa-9941b815d873). NanoID's compact alphanumeric form izkpm55j334u selects as one token. Implication: API URLs with NanoIDs are copy-paste-easier in the developer inner loop (curl, Slack messages, bug reports, documentation). Documented UX rationale; no hard metric. (Source: this post). See concepts/double-click-selectable-identifier.

  3. PlanetScale's NanoID is 12 characters from a 36-character alphabet, not the 21/64 default"we used the NanoID collision tool and decided to use 12 character long IDs with the alphabet of 0123456789abcdefghijklmnopqrstuvwxyz." Canonical choice of base-36 (no hyphens, no underscores, no uppercase) — narrower than NanoID's default 64-char URL-safe alphabet — explicitly chosen because uppercase vs lowercase is a footgun in case-insensitive contexts (URLs, DNS, some filesystems) and _/- break some legacy parsers. 12 chars × log2(36) ≈ 62 bits of entropy per ID — substantially below the 126 bits of NanoID-21 default. See concepts/nanoid-identifier. (Source: this post)

  4. Collision budget: 1% probability over 35 years at 1,000 IDs/hour"This gives us a 1% probability of a collision in the next ~35 years if we are generating 1,000 IDs per hour." Back-of-envelope: 1,000 × 24 × 365 × 35 ≈ 3×10^8 IDs total; birthday bound at 62-bit entropy crosses 1% around ~10^8-10^9 — matches. This is the canonical calculation anchor for the 12-char × 36-char-alphabet choice. Escape hatch: "If we ever need to increase this, the change would be as simple as increasing the length in our ID generator and updating our database schema to accept the new size." — length is a knob, not a contract. See concepts/nanoid-identifier. (Source: this post)

  5. Collision handling: 1000-retry bounded loop in before_create callback — PlanetScale's Rails PublicIdGenerator ActiveSupport concern wraps insert with MAX_RETRY = 1000 retries, generating a fresh NanoID each iteration and checking existence via self.class.where(public_id: public_id).exists? before committing: "raise "Failed to generate a unique public id after #{MAX_RETRY} attempts"" if all retries collide. At PlanetScale's entropy budget + generation rate, MAX_RETRY is essentially dead code — a true collision is ~10^-8 per ID — but the bounded retry is the correctness contract that makes the design not rely on statistical-never happening. See concepts/nanoid-collision-retry. (Source: this post)

  6. Canonical public_id-alongside-BIGINT-PK schema verbatim:

    CREATE TABLE `user` (
      `id` bigint NOT NULL AUTO_INCREMENT,
      `public_id` varchar(12) DEFAULT NULL,
      `name` varchar(255) NOT NULL,
      `created_at` datetime(6) NOT NULL,
      `updated_at` datetime(6) NOT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `idx_public_id` (`public_id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
    

    Two IDs per row: id BIGINT AUTO_INCREMENT is the clustered-index primary key (sequential — preserves B+tree insert locality per patterns/sequential-primary-key); public_id VARCHAR(12) is the externally-visible NanoID, enforced unique via a secondary index. "We still use standard auto-incrementing BigInts for our primary key. The public_id is only used as an external identifier." This is the canonical realisation of the pattern that the 2024-03-19 UUID-PK post references without disclosing mechanism. See patterns/public-id-alongside-bigint-pk. (Source: this post)

  7. Rails PublicIdGenerator concern as reusable model mixin — full source code disclosed:

    # app/models/concerns/public_id_generator.rb
    require "nanoid"
    
    module PublicIdGenerator
      extend ActiveSupport::Concern
    
      included do
        before_create :set_public_id
      end
    
      PUBLIC_ID_ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz"
      PUBLIC_ID_LENGTH = 12
      MAX_RETRY = 1000
    
      PUBLIC_ID_REGEX = /[#{PUBLIC_ID_ALPHABET}]{#{PUBLIC_ID_LENGTH}}\z/
    
      class_methods do
        def generate_nanoid(alphabet: PUBLIC_ID_ALPHABET, size: PUBLIC_ID_LENGTH)
          Nanoid.generate(size: size, alphabet: alphabet)
        end
      end
    
      def set_public_id
        return if public_id.present?
        MAX_RETRY.times do
          self.public_id = generate_public_id
          return unless self.class.where(public_id: public_id).exists?
        end
        raise "Failed to generate a unique public id after #{MAX_RETRY} attempts"
      end
    
      def generate_public_id
        self.class.generate_nanoid(alphabet: PUBLIC_ID_ALPHABET)
      end
    end
    

    Usage: include PublicIdGenerator in each model. The Rails before_create callback fires once per row insert; ActiveRecord's callback ordering ensures the public_id is populated before the INSERT statement is issued. Canonical realisation of the public-id-alongside-BIGINT-PK pattern on Rails. (Source: this post)

  8. Go implementation: publicid package using matoous/go-nanoid/v2 — PlanetScale also runs Go services that must generate public IDs, so the alphabet + length are re-declared as Go constants with matching constraints:

    const (
        alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"
        length   = 12
    )
    
    func New() (string, error) { return nanoid.Generate(alphabet, length) }
    func Must() string { return nanoid.MustGenerate(alphabet, length) }
    func Validate(fieldName, id string) error { ... }
    

    Plus a Validate helper that checks length + alphabet membership — enforces the format at the Go service boundary. Cross-language consistency is the fourth of the four explicit requirements that drove the NanoID choice. (Source: this post)

  9. NanoID's cross-language ecosystem is the differentiator"NanoID generators are available in many languages". PlanetScale's Ruby backend uses nanoid gem; Go services use github.com/matoous/go-nanoid/v2. Because the NanoID spec is just "pick N random bytes from a given alphabet", reimplementing it in any language is trivial — no timestamp format, no bit-packing, no canonical representation concerns. Contrast with Snowflake IDs which require coordinating machine-ID assignment across generators. (Source: this post)

Systems / concepts / patterns extracted

  • PlanetScale — the managed-Vitess vendor whose API uses NanoIDs for external identifiers.
  • PlanetScale HTTP API — the public-facing API surface whose URL shape (api.planetscale.com/v1/deploy-requests/izkpm55j334u) motivates the NanoID choice.
  • Ruby on Rails — the application-tier framework hosting the PublicIdGenerator ActiveSupport concern.
  • MySQL — the underlying database holding public_id VARCHAR(12) + UNIQUE KEY idx_public_id (public_id) secondary index.

  • NanoID (extend — this post is now canonical first-party disclosure of PlanetScale's use) — URL-safe random identifier family. Post parameterises at 12 chars × base-36.

  • Public ID column (new) — public_id column holding an externally-visible opaque identifier, separate from the internal numeric PK. Generated application-side; enforced unique at the database layer via secondary index.
  • NanoID collision retry (new) — bounded-retry loop (MAX_RETRY = 1000) that regenerates a NanoID on INSERT ... ON DUPLICATE KEY or explicit pre-insert exists? check. Correctness contract against the statistically-never-happens collision event.
  • Double-click-selectable identifier (new) — a pure UX property of API identifiers: clickable-once word-selection in browsers and terminals. Hyphens, periods, and slashes break this; a-z0-9 alphanumeric preserves it. A design constraint on API URL design distinct from security, uniqueness, or sortability.
  • UUID primary-key antipattern (extend) — canonical forward-link from UUID critique to NanoID adoption.

  • Public ID alongside BIGINT PK (new) — two-column identifier strategy: BIGINT AUTO_INCREMENT clustered PK for internal B+tree locality + external opaque identifier column (NanoID / UUID / ULID) with unique secondary index for API URLs.

  • Sequential primary key (extend — PlanetScale's public_id-alongside-BIGINT-PK schema is the canonical realisation of the "UUID/NanoID as external ID, BIGINT as internal PK" pattern that the existing page anticipates).

Operational numbers

  • Alphabet: 0123456789abcdefghijklmnopqrstuvwxyz36 characters (base-36 lowercase alphanumeric; no hyphens, no underscores, no uppercase). Narrower than NanoID's default 64-char URL-safe alphabet; chosen for case-insensitivity safety + legacy-parser compatibility.
  • Length: 12 characters. Shorter than NanoID's 21-char default. Shorter than UUID's 36-char hyphenated form.
  • Entropy: 12 × log2(36) ≈ 62 bits per ID.
  • Collision budget: "1% probability of a collision in the next ~35 years if we are generating 1,000 IDs per hour" (Mike Coutermarsh, calculation via zelark.github.io/nano-id-cc). At 1,000 IDs/hour × 24 × 365 × 35 = ~3×10^8 total IDs; birthday-bound 1% at 62-bit entropy.
  • MAX_RETRY: 1000 regeneration attempts on collision before raising. Effectively dead-code at PlanetScale's scale (true collision is ~10^-8 per ID) but the correctness contract.
  • Sample values (12-char lowercase alphanumeric): izkpm55j334u, z2n60bhrj7e8, qoucu12dag1x, kw2c0khavhql (from the sibling 2024 UUID-PK post).

Caveats

  • Pedagogy / developer-marketing voice, not a production retrospective. Mike Coutermarsh is PlanetScale's developer advocate at the time of publication, not a database-internals engineer — this is a "how we picked our ID format" tutorial with full code rather than an incident retrospective or mechanism deep-dive. Architecture density ~70% on ~800-word body. No production numbers beyond the birthday-bound calculation; no measurements of the retry-loop firing in production (it essentially never does).
  • The 12-char × base-36 choice is not universally appropriate — 62 bits of entropy is enough for API identifiers that are scoped per tenant or per resource type but would be weak for global-replacement of UUIDs where collision resistance against adversarial generation matters. The post doesn't quantify the threat model (e.g. does an attacker brute-forcing /api/v1/deploy-requests/<12-char> break anything?). Security-by-obscurity is implicitly rejected ("we wouldn't reveal the count of records in all our tables") but the adversarial-enumeration cost is not estimated.
  • MAX_RETRY = 1000 is a magic number — at 62-bit entropy and 10^8 IDs in circulation, the probability of even one retry firing across the table's lifetime is astronomically low; at 200-bit entropy it would be zero. The post doesn't explain why MAX_RETRY is set at 1000 rather than 10 or 10,000.
  • The where(public_id: ...).exists? pre-insert check is a race condition — two concurrent before_create callbacks generating the same NanoID would both see exists? == false, both attempt INSERT, and one would fail on the unique-index violation. The Rails code doesn't handle ActiveRecord::RecordNotUnique — at PlanetScale's entropy budget the race is essentially impossible (~10^-16 across two concurrent inserts) but the post doesn't engage with it. The MySQL-side UNIQUE KEY idx_public_id is the actual correctness backstop, not the Rails-side exists? check.
  • Double-click-selectability is not formally defined — different browsers + terminals use slightly different word-boundary rules. The post's "5 different words" example is Chrome-specific behaviour; some tools tokenise UUIDs as a single word. The axis is real but the metric is soft.
  • No cost framing for the secondary index — adding UNIQUE KEY idx_public_id on VARCHAR(12) doubles the B+tree write-amplification per insert (one clustered PK update + one unique-secondary-index update) plus adds ~30 bytes/row of index overhead. The post frames this as free; it's not.
  • NanoID is not a standard — no RFC, no IANA registration. PlanetScale is a commercial-vendor choice, not a safe default for interop. Contrast with UUIDv4 (RFC 4122) or UUIDv7 (draft RFC 9562). A downstream consumer parsing api.planetscale.com/v1/.../izkpm55j334u has no canonical grammar to validate against beyond PlanetScale's own PUBLIC_ID_REGEX.
  • Length is not actually freely knob-able in production — Coutermarsh claims "the change would be as simple as increasing the length in our ID generator and updating our database schema to accept the new size." In practice: existing 12-char IDs would still exist in URLs, emails, screenshots, log files; client code that assumes \[a-z0-9\]{12}\z regex would reject new longer IDs; the "simple" change is a multi-year backward-compatibility challenge. This is hand-wave.
  • Mike Coutermarsh is a marketing/devrel byline — not at the Ben Dicken / Shlomi Noach / Sam Lambert / Aaron Francis database-internals altitude. Tier-3 borderline: passes on first-party-disclosure grounds (this is where PlanetScale's NanoID choice is actually motivated; the sibling 2024-03-19 UUID-PK post forward-references this one) and on full-production-code disclosure grounds (the PublicIdGenerator Ruby concern + publicid Go package are both reproducible).

Source

Last updated · 470 distilled / 1,213 read