CONCEPT Cited by 1 source
Tagged pointer¶
Pack non-pointer data into the architecturally-unused high or low bits of a machine pointer, then mask/shift at access time to recover pointer and tag separately. Gains density (no separate tag field) without a wider container, at the cost of type-erasure from the language's view of the pointer and careful lifetime management through the masked representation.
Mechanics¶
x86-64 and ARM64 both have 64-bit pointers but address only 48 bits of memory (2^48 bytes = 256 TB, more than any real machine has as of 2026). The remaining 16 high bits (or in ARM's case the top byte via TBI) are architecturally unused — the CPU ignores them on load/store instructions. They can carry auxiliary data.
Figma's Multiplayer property map is the canonical wiki realization: sources/2026-04-21-figma-supporting-faster-file-load-times-memory-optimizations-rust
// Packed representation: top 16 bits = field_id, bottom 48 = pointer
let packed: u64 = ((field_id as u64) << 48) | (ptr as u64 & 0x0000_FFFF_FFFF_FFFF);
fn field_id(packed: u64) -> u16 { (packed >> 48) as u16 }
fn pointer(packed: u64) -> *const T { (packed & 0x0000_FFFF_FFFF_FFFF) as *const T }
Canonical instances across languages¶
- V8 (JavaScript engine) SMI tagging — JavaScript number values encoded directly in the bottom bit of a pointer-sized word; the bit pattern distinguishes small integers from heap-allocated objects. Saves a pointer-chase per arithmetic op.
- OCaml unboxed integer representation — integers are
n << 1 | 1, pointers have| 0in the low bit. The GC distinguishes them by that bit. - Objective-C tagged pointers (macOS/iOS) — small
NSNumber,NSDate,NSStringvalues encoded directly in the pointer with a tag in the low bits; avoids heap allocation entirely for small values. - JavaScriptCore NaN-boxing — exploits the IEEE-754 NaN payload
space to pack pointers/integers/booleans into
doubleslots. - LLVM
PointerIntPair<PtrT, IntBits>— first-class utility in LLVM's ADT library for packing a small integer tag into the low bits of an aligned pointer. - Linux kernel struct alignment tags — low bits of naturally-aligned pointers (e.g. to 16-byte-aligned slab objects) are used as type-discriminator bits throughout the kernel.
When it pays off¶
Three conditions should hold:
- Hot data structure with pointer + small-enum co-location. If every pointer entry also carries a fixed-width tag (type flag, field ID, generation counter, …), tagging halves the container footprint.
- Tag fits in available bits. On x86-64: ≤ 16 bits at the top, or a few bits at the bottom if pointers are aligned. On ARM with TBI: up to 8 bits at the top.
- Access is hot enough to justify the ergonomic cost. Every dereference pays a mask/shift instruction, and the compiler can't see the pointer type — you lose some inlining / autovectorization.
Failure modes and caveats¶
- Architecture-specific. x86-64's 48-bit address cap is not
guaranteed forever — Linux's
CONFIG_X86_5LEVELenables 57-bit addressing; future CPUs may narrow the free-bit budget. Code must encode an assumption gated by#ifdefor feature flag. - Memory-safety risk. With refcounted pointers
(
Rc/Arcin Rust), insert and get must correctly increment/decrement refcounts through the masked pointer; a bug = double-free / UAF. Figma explicitly cited this as the reason its pointer-tagging experiment wasn't productionized — the benchmark win (~5% RSS) wasn't worth the potential memory-corruption footguns. - Debugging / tooling degradation.
gdb/lldb/ heap profilers interpret raw pointers; a packedu64shows as garbage. Custom pretty-printers needed. - RSS-vs-allocated divergence. Halving allocated bytes (20% theoretical reduction in Figma's case) doesn't necessarily halve resident set size — allocators, page-commit patterns, and OS alignment interact. Figma's 20% allocated-bytes win rendered as only ~5% RSS (concepts/go-runtime-memory-model / same lesson from Datadog Go 1.24). RSS is what the OOM-killer and Kubernetes care about.
- Not compatible with sanitizers that use the high bits of pointers for shadow-memory tracking (HWASan uses the ARM top-byte explicitly). Choose one.
Ergonomic approach¶
Encapsulate tagging in a newtype with only safe operations exposed:
#[repr(transparent)]
struct TaggedPointer(u64);
impl TaggedPointer {
fn new(tag: u16, ptr: *const T) -> Self { ... }
fn tag(&self) -> u16 { ... }
fn ptr(&self) -> *const T { ... }
}
Keep the unsafe to the encapsulation layer; callers never see the
raw u64. This is how LLVM's PointerIntPair and the Rust
tagged-pointer crate ship.
Relationship to adjacent techniques¶
- Orthogonal to concepts/small-map-as-sorted-vec — the two
compose. Flat-map gives you
Vec<(K, V)>; tagged pointer collapses it toVec<u64>if K + V fit in 64 bits. - Sibling of concepts/bitpacking — both exploit spare bits; bitpacking is the quantization use case (pack sub-byte elements into uint8/32 containers), tagged pointer is the pointer use case.
- Cousin of struct-of-arrays — both sacrifice a clean abstraction for tighter in-memory layout.
Lesson¶
Pointers carry spare bits on modern CPUs; using them is a classical density trick, but the benchmark win is always bigger than the RSS win because real allocators don't give back savings proportionally. In memory-safe languages with refcounting or GC, the encapsulation cost against pointer-corruption classes is load-bearing — test and fuzz the refcount path before shipping.