CONCEPT Cited by 1 source
Content-addressed ID¶
Definition¶
A content-addressed ID is an identifier derived deterministically from the content of a record via a cryptographic hash, so that two records with identical content necessarily share the same ID. Storage writes become idempotent by construction: INSERT OR IGNORE silently skips duplicates; there is no separate dedup pass, no duplicate-detection bookkeeping, no risk of a race creating two rows for the same logical item.
The canonical hash choice is a truncated cryptographic hash (SHA-256, BLAKE3, etc.); truncation to 128 bits is sufficient for deduplication at realistic scales (collision probability far below real-world cardinality).
The pattern¶
The load-bearing property: the writer doesn't coordinate with any other writer to avoid duplicates. Two parallel paths can both race to store the same message; SQLite / the storage engine sees the same id, takes the first, ignores the second.
Canonical wiki instance: Cloudflare Agent Memory¶
Agent Memory assigns every ingested message a content-addressed ID:
"If the same conversation is ingested twice, every message resolves to the same ID, making re-ingestion idempotent."
At storage time, "everything is written to storage using INSERT OR IGNORE so that content-addressed duplicates are silently skipped."
The input fields are deliberate:
- sessionId — scopes the same text spoken in two different sessions to different IDs (correct — they're different instances of the same sentence).
- role — distinguishes "user: yes" from "assistant: yes".
- content — the actual material.
Stripping any of these would either produce over-aggressive deduplication (two users saying "yes" collapsing to one row) or pointless differentiation (same message ingested twice producing two rows). The three-tuple is the minimum stable identity.
Why a truncated hash?¶
- Collisions are still astronomically unlikely at 128 bits. For N = 10^15 records, collision probability ≈ N² / 2^129 ≈ 10^−9 — far below real-world concerns for per-profile memory stores.
- Storage savings vs full 256-bit IDs. Each ID is half the bytes; indexes are narrower; cache footprint smaller.
- Faster equality checks. Often fits in two machine words.
Content addressing vs sequence-number IDs¶
| Property | Content-addressed | Sequence number |
|---|---|---|
| Dedup cost | zero (constraint-based) | separate dedup query per write |
| Distributed-write coordination | none needed | coordinator / counter |
| Idempotent retry | yes | no (retry = new ID) |
| Human-readable | no (opaque hex) | yes (monotonic integer) |
| Sortable by creation time | no | yes |
Content-addressed IDs excel for idempotent ingest paths where the same logical item may be re-presented by different paths (harness retry, client-side crash + replay, two parallel workers consuming the same bus message).
Related idioms in this wiki¶
- concepts/content-addressed-caching — content-addressed cache keys so identical binary content serves one cache hit across independent producers (build systems, container registries).
- concepts/immutable-object-storage — objects keyed by hash so a key change implies a content change.
- Git's object model — files + trees + commits keyed by content hash; the origin idiom that popularised content addressing in software engineering.
- Build-system reproducibility — Bazel / Nix content-addressed outputs.
Agent Memory is the canonical wiki instance of the idiom applied at the individual message / memory granularity for idempotent agent-loop ingestion.
Seen in¶
- sources/2026-04-17-cloudflare-agents-that-remember-introducing-agent-memory — SHA-256(sessionId + role + content) truncated to 128 bits;
INSERT OR IGNOREstorage writes.
Related¶
- concepts/agent-memory
- concepts/memory-supersession — sibling dedup mechanism at topic-key granularity.
- systems/cloudflare-agent-memory
- concepts/content-addressed-caching
- concepts/immutable-object-storage