CONCEPT

Kafka offset+partition uniqueness constraint¶

Kafka offset+partition uniqueness constraint is a database-side idempotence primitive for at-least-once Kafka consumers: every row persisted from a Kafka message carries the source (topic, partition, offset) triple as a uniqueness-constrained column, and an INSERT … ON DUPLICATE KEY / INSERT … IGNORE turns any consumer retry into a silent no-op.

The triple (topic, partition, offset) is a natural unique identifier for every record in a Kafka cluster. Using it as a uniqueness constraint in the downstream DB pushes deduplication into the storage layer, where the SQL engine handles the race natively.

(Source: .)

Canonical PlanetScale framing¶

Rafer Hazen, 2023-08-10: "For both the aggregate and slow query topics, we track the offset and partition from the underlying Kafka messages in the MySQL tables and use uniqueness constraints to avoid duplicating data if Kafka consumers retry batches following a failure."

Why this is load-bearing¶

Kafka consumer semantics are at-least-once by default. The consumer commits offsets after processing; if the process crashes between "DB write succeeded" and "offset committed", the next consumer instance replays the batch and attempts to re-insert the same records.

Without a uniqueness constraint, replay produces duplicate rows — for a slow-query-log row this is a visible product bug; for an aggregate-coalesced row this is a double-count.

The uniqueness constraint closes the gap: the DB rejects the duplicate insert with a constraint violation (or silently no-ops with INSERT IGNORE / ON DUPLICATE KEY UPDATE), and the retry proceeds past the duplicate to any unprocessed records in the batch.

Composition with in-memory coalescing¶

When the consumer coalesces multiple Kafka messages into one DB row, the uniqueness constraint must be on some derived key of the coalesced batch — not the individual per-message offsets. PlanetScale's aggregate table tracks the offset+partition of the last message merged into each row; a retry on a batch after partial write proceeds to the records beyond the last committed.

The slow-query topic maps one-to-one (one Kafka message → one MySQL row), so the offset+partition uniqueness constraint is trivially applicable there.

Exactly-once semantics — the cheap version¶

The exactly-once-semantics (EOS) property is usually achieved via Kafka's transactional producer + read-committed consumer or via Kafka Streams' transactional writes. The (offset, partition) uniqueness constraint achieves equivalent correctness for the downstream DB without needing Kafka-side EOS — at the cost of one extra column per row and a small constraint-check overhead per insert.

For storage-backed consumers (as opposed to stream-processing downstream), this is usually the simpler choice.

Variants¶

INSERT IGNORE — silently skip duplicate keys.
INSERT … ON DUPLICATE KEY UPDATE — treat duplicate as an upsert; update other columns if needed.
Separate dedup table — keep a compact (topic, partition, offset) lookup table; reject a write by pre-checking.

Seen in¶

— canonical wiki disclosure. Hazen canonicalises the composition of Kafka at-least-once delivery with MySQL-side uniqueness-constraint dedup as the Insights pipeline's consumer-idempotence mechanism.