Skip to content

PATTERN Cited by 2 sources

Public ID alongside BIGINT PK

Problem

An application must expose opaque, stable, URL-friendly identifiers in its public API — so that:

  • API consumers can't enumerate rows by sequential scan (GET /users/1, GET /users/2, ...).
  • The API doesn't leak row counts ("we wouldn't reveal the count of records in all our tables").
  • The identifier is opaque — clients don't depend on internal row ordering.

But the application also wants fast InnoDB clustered-index writes, which requires a sequential primary key (see concepts/uuid-primary-key-antipattern, patterns/sequential-primary-key). A random UUID / NanoID as the clustered PK collapses write performance through:

  • Unpredictable insert paths → cache miss per insert.
  • Scattered node splits → write amplification.
  • ~50% page fill factor instead of ~94% on sequential keys.

You can't satisfy both properties with a single ID column.

Solution

Use two IDs per row:

  1. Internal PK: BIGINT UNSIGNED AUTO_INCREMENT as the clustered-index primary key. Preserves B+tree insert locality. Never exposed to API consumers.
  2. External ID: a VARCHAR(N) column holding an opaque, randomly-generated identifier — typically a NanoID, Snowflake, ULID, or UUID — with a UNIQUE KEY secondary index. Exposed in URLs, JSON payloads, emails.

Canonical schema

From PlanetScale's 2022-03-29 post (Source: sources/2026-04-21-planetscale-why-we-chose-nanoids-for-planetscales-api):

CREATE TABLE `user` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `public_id` varchar(12) DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  `created_at` datetime(6) NOT NULL,
  `updated_at` datetime(6) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_public_id` (`public_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Two properties: - id BIGINT AUTO_INCREMENT PRIMARY KEY → monotonically-increasing clustered PK; inserts land on the right-most B+tree path; ~94% page fill factor. - public_id VARCHAR(12) with UNIQUE KEY idx_public_id → NanoID (PlanetScale's choice; see concepts/nanoid-identifier) enforced unique at DB layer. Secondary-index B+tree takes the random-PK-style ~50% fill-factor hit — but it's a single index, not the whole clustered table, so the cost is bounded.

Generation and enforcement

The external ID is generated application-side before INSERT, not server-side. PlanetScale's Rails realisation (concepts/public-id-column):

module PublicIdGenerator
  extend ActiveSupport::Concern
  included do
    before_create :set_public_id
  end

  PUBLIC_ID_ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz"
  PUBLIC_ID_LENGTH = 12
  MAX_RETRY = 1000

  def set_public_id
    return if public_id.present?
    MAX_RETRY.times do
      self.public_id = generate_public_id
      return unless self.class.where(public_id: public_id).exists?
    end
    raise "Failed to generate a unique public id after #{MAX_RETRY} attempts"
  end
end

Collision handling is bounded-retry: 1000 regeneration attempts before giving up. At reasonable entropy levels (~60+ bits) the retry loop is effectively dead code but the correctness contract against statistical birthday-paradox accidents.

Parallel Go implementation for services that also need to issue public IDs:

const (
    alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"
    length   = 12
)
func New() (string, error) { return nanoid.Generate(alphabet, length) }

Cross-language ecosystem support is a first-order selection criterion when the backend mixes languages (PlanetScale runs Ruby + Go).

Structural properties

  • Writes scale like sequential PK — inserts hit the right-most leaf of the clustered B+tree. Buffer-pool hit rate for writes approaches 100% regardless of table size.
  • Reads by public_id cost one extra B+tree walk — primary-key lookup is O(log N); public-id lookup is O(log N) on the secondary index + O(log N) on the clustered PK to fetch the row. 2× the logical lookup cost but still O(log N).
  • API URLs are opaque, stable, and collision-resistantapi.planetscale.com/v1/deploy-requests/izkpm55j334u satisfies the double-click-selectability property (alphanumeric, no hyphens) that UUID URLs break.
  • Row count is hidden — external observers can't infer table cardinality from consecutive public_ids (unlike /users/1, /users/2).
  • Internal joins stay cheap — FK relationships still go through the BIGINT PK. public_id is only surfaced at the API boundary; every internal table relationship uses the sequential integer.

Why it works

The pattern is "use the right ID for the right axis":

  • Physical storage axis — wants monotonic increment (B+tree locality, page fill, cache hits).
  • API surface axis — wants opacity + uniqueness + URL-friendliness + human-copyability.

These are orthogonal requirements that a single ID column can't satisfy. The pattern acknowledges the conflict and resolves it by separating concerns into two columns, each optimised for its own axis, connected by a unique-index B+tree walk.

Variations

Variant External ID Rationale
PlanetScale API 12-char NanoID, base-36 Double-click-selectable, compact, cross-language generator ecosystem
GitHub API Numeric, but surface-obfuscated Row count visible but rate-limited; different threat model
Stripe API prefix_ + opaque alphanumeric (e.g. cus_abc123) Prefixes encode object type; alphanumeric tail is effectively a NanoID
UUIDv7 36-char UUID Self-sortable + standardised; loses double-click but gains interop
ULID 26-char Crockford base32 Time-ordered + URL-safe; uppercase ambiguity vs case-sensitivity

PlanetScale's choice tunes heavily for developer experience of the URL (12-char, lowercase, alphanumeric, no hyphens); other vendors pick different points on the same design surface.

Caveats

  • Secondary-index write cost is not free — every INSERT writes two B+tree branches: clustered PK (sequential, cheap) + public_id unique index (random, ~2× the pages of a sequential-inserted index). The pattern isolates the random-write cost to the one secondary index rather than smearing it across the whole table, but doesn't eliminate it.
  • Index widthVARCHAR(12) public_id is ~12-14 bytes + overhead per secondary-index entry. Still dramatically less than a CHAR(36) UUID (36 bytes) or BINARY(16) (16 bytes) index. See concepts/uuid-string-representation-mysql for byte-width comparisons.
  • Race condition on pre-insert exists? check — the Ruby generator's where(public_id: ...).exists? is a read-then-write race. Two concurrent requests generating the same NanoID would both see exists? == false and both attempt INSERT; one would fail on the unique-index violation and raise ActiveRecord::RecordNotUnique. At ~60+ bit entropy the race is ~10^-16 per concurrent-insert pair; in practice the MySQL-side UNIQUE KEY is the actual correctness backstop, not the application-side exists? check.
  • Doesn't help with cross-DB migrations — if rows ever move across sharded databases or replicated to a warehouse, the BIGINT PK is shard-local (AUTO_INCREMENT is not globally unique across shards). Downstream consumers must join on public_id, not id. This is the usual sharding-breaks-AUTO_INCREMENT caveat; NanoID saves you here as the globally-unique anchor.
  • Inbound API idempotency still needs care — NanoIDs don't give retry-idempotency for free. A POST that times out before returning the NanoID leaves the client unsure whether the row was created. Separate idempotency tokens (Stripe-style) are orthogonal to the public-ID pattern.
  • DEFAULT NULL is a footgun — the PlanetScale example's public_id varchar(12) DEFAULT NULL allows rows without public IDs if the before_create callback is bypassed (raw SQL inserts, Rails seed scripts, activerecord-import). The unique-index allows multiple NULLs in MySQL. Consider NOT NULL + a DB-generated default if application-generation is ever bypassed.
  • Changing the external-ID format is a backward-compatibility cliff — existing 12-char public_ids are in URLs, emails, Slack messages, screenshots, documentation, client-side caches. "Just increase the length" is hand-wave; in practice any format change is a multi-year dual-format migration.
  • Doesn't prevent enumeration via timestamp joins — a determined attacker can still estimate row-creation rate by observing created_at on retrieved rows (if that column is exposed). The pattern hides count but not time. Use Snowflake / ULID if you want time-encoded-in-ID for client-side sorting; NanoID specifically discards this.

Seen in

  • sources/2026-04-21-planetscale-why-we-chose-nanoids-for-planetscales-api — Mike Coutermarsh (PlanetScale, 2022-03-29) canonicalises the pattern with the full CREATE TABLE user schema, the PublicIdGenerator Rails concern, and the Go publicid package. "We still use standard auto-incrementing BigInts for our primary key. The public_id is only used as an external identifier." First-party disclosure that PlanetScale's external API IDs (deploy-request IDs, database IDs, branch IDs) are 12-char base-36 NanoIDs while the underlying rows use BIGINT AUTO_INCREMENT PKs.
  • sources/2026-04-21-planetscale-the-problem-with-using-a-uuid-primary-key-in-mysql — Brian Morrison II (PlanetScale, 2024-03-19) references this pattern indirectly when recommending UUID alternatives: "Snowflake IDs ... ULIDs ... NanoIDs (which we use at PlanetScale)" — the "which we use at PlanetScale" disclosure is for external IDs, not for clustered PKs, per the 2022 Coutermarsh post this defers to.
Last updated · 470 distilled / 1,213 read