CONCEPT Cited by 1 source

Content-addressed ID¶

Definition¶

A content-addressed ID is an identifier derived deterministically from the content of a record via a cryptographic hash, so that two records with identical content necessarily share the same ID. Storage writes become idempotent by construction: INSERT OR IGNORE silently skips duplicates; there is no separate dedup pass, no duplicate-detection bookkeeping, no risk of a race creating two rows for the same logical item.

The canonical hash choice is a truncated cryptographic hash (SHA-256, BLAKE3, etc.); truncation to 128 bits is sufficient for deduplication at realistic scales (collision probability far below real-world cardinality).

The pattern¶

id = HASH(stable-identity-fields)[:N bits]
writes = INSERT OR IGNORE (id, ...)

The load-bearing property: the writer doesn't coordinate with any other writer to avoid duplicates. Two parallel paths can both race to store the same message; SQLite / the storage engine sees the same id, takes the first, ignores the second.

Canonical wiki instance: Cloudflare Agent Memory¶

Agent Memory assigns every ingested message a content-addressed ID:

id = SHA-256(sessionId + role + content)[:128 bits]

"If the same conversation is ingested twice, every message resolves to the same ID, making re-ingestion idempotent."

— (Cloudflare, 2026-04-17)

At storage time, "everything is written to storage using INSERT OR IGNORE so that content-addressed duplicates are silently skipped."

The input fields are deliberate: - sessionId — scopes the same text spoken in two different sessions to different IDs (correct — they're different instances of the same sentence). - role — distinguishes "user: yes" from "assistant: yes". - content — the actual material.

Stripping any of these would either produce over-aggressive deduplication (two users saying "yes" collapsing to one row) or pointless differentiation (same message ingested twice producing two rows). The three-tuple is the minimum stable identity.

Why a truncated hash?¶

Collisions are still astronomically unlikely at 128 bits. For N = 10^15 records, collision probability ≈ N² / 2^129 ≈ 10^−9 — far below real-world concerns for per-profile memory stores.
Storage savings vs full 256-bit IDs. Each ID is half the bytes; indexes are narrower; cache footprint smaller.
Faster equality checks. Often fits in two machine words.

Content addressing vs sequence-number IDs¶

Property	Content-addressed	Sequence number
Dedup cost	zero (constraint-based)	separate dedup query per write
Distributed-write coordination	none needed	coordinator / counter
Idempotent retry	yes	no (retry = new ID)
Human-readable	no (opaque hex)	yes (monotonic integer)
Sortable by creation time	no	yes

Content-addressed IDs excel for idempotent ingest paths where the same logical item may be re-presented by different paths (harness retry, client-side crash + replay, two parallel workers consuming the same bus message).

concepts/content-addressed-caching — content-addressed cache keys so identical binary content serves one cache hit across independent producers (build systems, container registries).
concepts/immutable-object-storage — objects keyed by hash so a key change implies a content change.
Git's object model — files + trees + commits keyed by content hash; the origin idiom that popularised content addressing in software engineering.
Build-system reproducibility — Bazel / Nix content-addressed outputs.

Agent Memory is the canonical wiki instance of the idiom applied at the individual message / memory granularity for idempotent agent-loop ingestion.

Seen in¶

sources/2026-04-17-cloudflare-agents-that-remember-introducing-agent-memory — SHA-256(sessionId + role + content) truncated to 128 bits; INSERT OR IGNORE storage writes.

concepts/agent-memory
concepts/memory-supersession — sibling dedup mechanism at topic-key granularity.
systems/cloudflare-agent-memory
concepts/content-addressed-caching
concepts/immutable-object-storage