Skip to content

CONCEPT Cited by 1 source

NanoID collision retry

NanoID collision retry is a bounded application-side retry loop that regenerates a random identifier on collision — before INSERT, if checked by exists?, or after INSERT, on RecordNotUnique from the unique-index constraint. The pattern is the correctness contract that protects against statistical-birthday-paradox accidents in application-generated unique identifiers (NanoID, UUIDv4, short-ID schemes).

Canonical realisation

From PlanetScale's 2022-03-29 post (Source: sources/2026-04-21-planetscale-why-we-chose-nanoids-for-planetscales-api):

MAX_RETRY = 1000

def set_public_id
  return if public_id.present?
  MAX_RETRY.times do
    self.public_id = generate_public_id
    return unless self.class.where(public_id: public_id).exists?
  end
  raise "Failed to generate a unique public id after #{MAX_RETRY} attempts"
end

Loop structure:

  1. Generate a fresh NanoID via the crypto-strong random source.
  2. Issue a pre-check: SELECT 1 FROM table WHERE public_id = ? LIMIT 1.
  3. If no match, assign and return — the next step (INSERT) will succeed.
  4. If match, loop up to MAX_RETRY times.
  5. If all retries collide, raise an exception.

Why the retry loop exists

At PlanetScale's NanoID parameterisation (12 characters × base-36 alphabet = ~62 bits of entropy), the probability of a single collision per INSERT against an N-row table is approximately N / 2^62. At N = 10^9 rows, the per-INSERT collision probability is ~2 × 10^-10 — effectively zero but non-zero. In a table over a decade of production operation, at ~1000 inserts/hour, the retry loop fires approximately never (expected ~0 retries across the full fleet lifetime).

So why write it?

  1. Correctness contract without statistical arguments. The code must be correct even if the probability calculation is wrong — e.g. if the PRNG is compromised, if the alphabet is accidentally narrowed, if a test environment seeds a deterministic RNG, if the entropy budget is misread.
  2. Protection against deterministic generation bugs. Someone forgetting to seed SecureRandom properly, a test fixture that hard-codes an ID, a library bug that re-emits the same NanoID under concurrent load — any of these would turn "statistically impossible" into "routinely happens" and the retry loop contains the blast radius.
  3. Compose with the unique index. The real correctness primitive is the UNIQUE KEY idx_public_id at the database layer. The application-side retry is a graceful degradation — it avoids surfacing RecordNotUnique exceptions to users in the rare case the pre-check races. Without the index, the retry alone is not enough; without the retry, collisions would surface as 500 errors.
  4. Allow the length to be tuned aggressively. If the application uses a short NanoID (say 6-8 chars), the retry loop becomes essential. The same pattern handles both extremes without code changes.

The MAX_RETRY knob

PlanetScale sets MAX_RETRY = 1000. The number is a magic constant — higher than needed at their entropy budget (1 retry is already astronomically unlikely), lower than infinite so the failure mode is bounded.

Alternative values: - 10: aggressive; crashes fast if something is structurally wrong. - 100: balanced; almost certainly never hit. - 1000: PlanetScale's choice; very high safety margin. - Infinite: dangerous — masks structural bugs (hard-coded IDs, broken PRNG) as infinite loops.

Retry vs re-insert approaches

Three strategies for handling NanoID collision:

Strategy Pre-check On failure Code shape
Pre-check (PlanetScale) exists? query Generate new ID Rails before_create
Insert-then-retry None Rescue RecordNotUnique, retry Exception-handler around save
ON DUPLICATE KEY None INSERT ... ON DUPLICATE KEY UPDATE id = id with ignore Raw SQL

PlanetScale picks pre-check because it keeps the correctness primitive (the unique index) as a backstop while giving clean Rails semantics (the RecordNotUnique exception never surfaces in the happy path). The pre-check is race-prone but at 62-bit entropy the race is far outside any realistic threat model.

Race condition on pre-check

The where(public_id: ...).exists? check is not atomic with the subsequent INSERT. Two concurrent callbacks generating the same NanoID would both see exists? == false and both attempt INSERT. One would succeed; the other would fail on the UNIQUE KEY idx_public_id with ActiveRecord::RecordNotUnique.

At PlanetScale's entropy budget (~62 bits) and typical concurrent-write rate (~10-100 concurrent inserts), the probability of this race firing is around 10^-16 per concurrent-insert pair — essentially impossible. In lower-entropy schemes (e.g. 6-char alphanumeric = ~30 bits) the race is observable under concurrent load and the DB-side unique index becomes the real correctness primitive, not the Rails-side exists? check.

Caveats

  • Pre-check is not atomic — under high concurrency + low entropy the race fires. The pattern relies on the DB unique index as the actual correctness primitive, with the pre-check as a UX optimisation (avoids surfacing retries).
  • MAX_RETRY of 1000 is arbitrary — at PlanetScale's entropy budget even MAX_RETRY = 10 would be sufficient. The choice is a safety margin, not a measured parameter.
  • Retry cost is dominated by the pre-check SELECT — each retry iteration issues a DB round-trip. In the happy path (0 retries), overhead is 1 SELECT per INSERT. In the pathological case (e.g. PRNG broken to return constant) the loop fails fast but burns 1000 SELECT queries before raising.
  • Loop does not handle ActiveRecord::RecordNotUnique from INSERT — if the pre-check passes but a concurrent INSERT lands the same public_id, the outer INSERT raises and the record fails. The Rails code doesn't catch this. At PlanetScale's entropy, unlikely enough to ignore; in lower-entropy schemes, this is the biggest gap.
  • Silent failure on non-Rails insert paths — raw SQL inserts, activerecord-import, background-job bulk inserts can bypass the before_create callback entirely and produce rows with public_id = NULL (allowed by DEFAULT NULL) or duplicate IDs (if pre-generated incorrectly). The retry-loop contract applies only to the before_create path.
  • No exponential backoff — the retry loop has no sleep between iterations. In the extreme case of a broken PRNG, the loop burns CPU + DB round-trips as fast as possible for 1000 iterations (typically <1 second) before raising. For NanoID the high-collision case is impossible; for lower-entropy schemes, consider backoff.

Seen in

  • sources/2026-04-21-planetscale-why-we-chose-nanoids-for-planetscales-api — Mike Coutermarsh (PlanetScale, 2022-03-29) discloses the full PublicIdGenerator Ruby concern with MAX_RETRY = 1000: "this code runs and generates the ID for us. [...] handles retries in the small chance of a duplicate." First wiki canonicalisation of the bounded-retry collision-handling primitive for NanoIDs specifically. The retry loop is structurally dead-code at PlanetScale's entropy budget but is the correctness contract the design doesn't want to skip.
Last updated · 470 distilled / 1,213 read