Skip to content

CONCEPT Cited by 1 source

Snowflake ID

A Snowflake ID is a 64-bit time-ordered identifier originally designed by Twitter (2010) for generating unique post IDs across a sharded fleet without coordination. The bit layout packs a millisecond timestamp + a machine ID + a per-millisecond sequence number into a single BIGINT-compatible value, so inserts into a clustered- index B+tree retain the sequential-insert locality of an auto-incrementing integer while also being uniquely generatable from any machine in the fleet without central coordination. (Source: sources/2026-04-21-planetscale-the-problem-with-using-a-uuid-primary-key-in-mysql.)

Bit layout

Twitter's canonical Snowflake layout (64 bits total):

Bits Field Meaning
1 sign always 0 (keeps value positive for signed BIGINT)
41 timestamp ms since a custom epoch (e.g. Twitter's 2010-11-04)
10 machine ID 5 bits datacenter + 5 bits worker
12 sequence per-ms counter, resets every ms

Sample from the PlanetScale post:

7167350074945572864

This is a 64-bit integer — fits in MySQL's BIGINT (signed) or BIGINT UNSIGNED. Compare to UUIDs: 128 bits (2× wider), stored as BINARY(16) or CHAR(36).

Why it's a good primary key

  1. Time-ordered. Byte-wise sort = temporal sort. Inserts land on the right-most path of a clustered-index B+tree — same locality property as BIGINT AUTO_INCREMENT.
  2. Half the size of a UUID. 8 bytes vs 16. Halves the B+tree key width, increases fan-out, halves secondary-index PK-amplification overhead. See concepts/uuid-primary-key-antipattern.
  3. Distributed-generatable. Machine ID guarantees no collision across the fleet without coordination — as long as machine IDs are unique.
  4. Fits a machine word. Every comparison is one 64-bit CPU instruction — same as a regular integer.

Limits

  • ~70-year timestamp ceiling. 41 bits of ms is 2^41 ms ≈ 70 years from the custom epoch. Most deployments choose an epoch near their launch date to maximise remaining lifetime.
  • 1024 unique machine IDs. 10 bits = 1024 workers fleet-wide. Large fleets need wider machine-ID fields or hierarchical allocation.
  • 4096 IDs per millisecond per machine. 12 sequence bits per ms. Exceeding this rate requires blocking, bumping the timestamp forward, or falling back.

Generation

Not standardised as an RFC — multiple implementations with different bit allocations:

  • Twitter Snowflake (2010) — the original, open- sourced but Twitter stopped maintaining it.
  • Discord Snowflake — same 64-bit shape, different epoch (2015-01-01), different machine-ID split (5/5/12 → 5/5/12).
  • Instagram shard IDs — similar shape, 13-bit shard ID instead of datacenter+worker.
  • Sony's Sonyflake — 39-bit timestamp at 10 ms resolution, 16-bit machine ID, 8-bit sequence — trades ms resolution for larger fleets.
  • YouTube video IDs — base64-encoded 64-bit, not strictly time-ordered internally.
  • Mastodon uses the same 64-bit ID everywhere for status_id / account_id — same trick.

vs UUIDs

Property UUIDv4 UUIDv7 Snowflake ID
Width 128 bits 128 bits 64 bits
Time-ordered No Yes Yes
Distributed-generatable Yes Yes Yes (with machine ID)
Traceable to generator No No Yes (machine ID)
Standardised RFC 4122 RFC 4122 (draft → ratified 2024) No standard
B+tree insert locality Bad Good Good
Fits in BIGINT No No Yes
Client-library support Every language Emerging Per-implementation

Snowflake wins on width (half the storage) and already-ubiquitous DB support (every language has BIGINT). UUIDv7 wins on standardisation, opacity (no machine-ID leak), and client-library generality.

vs other alternatives

Caveats

  • Clock-skew sensitivity. If a machine's wall clock goes backwards (NTP jump, leap second correction), a naive generator produces IDs with timestamps in the past — which can collide with previously-generated IDs or violate monotonicity. Production implementations detect clock-skew and either block until the clock catches up or use a fallback counter.
  • Machine-ID allocation is a distributed-systems problem. Getting unique machine IDs to every generator at boot requires a registry (ZooKeeper, etcd, Consul, a managed DB sequence) — coordination overhead that UUIDs avoid.
  • Not globally unique across organisations. Snowflake IDs only guarantee uniqueness within a single fleet's machine-ID namespace. UUIDs are unique across independent systems.
  • Timestamp epoch is custom. Comparing IDs across two systems requires knowing each system's epoch. This is an operational gotcha during migrations or forensic analysis.
  • Not browser-friendly. JavaScript numbers are IEEE-754 doubles and can't represent all 64-bit integers precisely above 2^53. Serialise Snowflake IDs as strings on the wire for web APIs — often with base62 or base64 encoding.
  • Bit layout isn't standardised. Different systems use different splits, different epochs, different resolutions — the 64-bit shape is the only thing they agree on.

Seen in

Last updated · 470 distilled / 1,213 read