Skip to content

CONCEPT Cited by 1 source

Tagged pointer

Pack non-pointer data into the architecturally-unused high or low bits of a machine pointer, then mask/shift at access time to recover pointer and tag separately. Gains density (no separate tag field) without a wider container, at the cost of type-erasure from the language's view of the pointer and careful lifetime management through the masked representation.

Mechanics

x86-64 and ARM64 both have 64-bit pointers but address only 48 bits of memory (2^48 bytes = 256 TB, more than any real machine has as of 2026). The remaining 16 high bits (or in ARM's case the top byte via TBI) are architecturally unused — the CPU ignores them on load/store instructions. They can carry auxiliary data.

Figma's Multiplayer property map is the canonical wiki realization: sources/2026-04-21-figma-supporting-faster-file-load-times-memory-optimizations-rust

// Packed representation: top 16 bits = field_id, bottom 48 = pointer
let packed: u64 = ((field_id as u64) << 48) | (ptr as u64 & 0x0000_FFFF_FFFF_FFFF);

fn field_id(packed: u64) -> u16 { (packed >> 48) as u16 }
fn pointer(packed: u64) -> *const T { (packed & 0x0000_FFFF_FFFF_FFFF) as *const T }

Canonical instances across languages

  • V8 (JavaScript engine) SMI tagging — JavaScript number values encoded directly in the bottom bit of a pointer-sized word; the bit pattern distinguishes small integers from heap-allocated objects. Saves a pointer-chase per arithmetic op.
  • OCaml unboxed integer representation — integers are n << 1 | 1, pointers have | 0 in the low bit. The GC distinguishes them by that bit.
  • Objective-C tagged pointers (macOS/iOS) — small NSNumber, NSDate, NSString values encoded directly in the pointer with a tag in the low bits; avoids heap allocation entirely for small values.
  • JavaScriptCore NaN-boxing — exploits the IEEE-754 NaN payload space to pack pointers/integers/booleans into double slots.
  • LLVM PointerIntPair<PtrT, IntBits> — first-class utility in LLVM's ADT library for packing a small integer tag into the low bits of an aligned pointer.
  • Linux kernel struct alignment tags — low bits of naturally-aligned pointers (e.g. to 16-byte-aligned slab objects) are used as type-discriminator bits throughout the kernel.

When it pays off

Three conditions should hold:

  1. Hot data structure with pointer + small-enum co-location. If every pointer entry also carries a fixed-width tag (type flag, field ID, generation counter, …), tagging halves the container footprint.
  2. Tag fits in available bits. On x86-64: ≤ 16 bits at the top, or a few bits at the bottom if pointers are aligned. On ARM with TBI: up to 8 bits at the top.
  3. Access is hot enough to justify the ergonomic cost. Every dereference pays a mask/shift instruction, and the compiler can't see the pointer type — you lose some inlining / autovectorization.

Failure modes and caveats

  • Architecture-specific. x86-64's 48-bit address cap is not guaranteed forever — Linux's CONFIG_X86_5LEVEL enables 57-bit addressing; future CPUs may narrow the free-bit budget. Code must encode an assumption gated by #ifdef or feature flag.
  • Memory-safety risk. With refcounted pointers (Rc / Arc in Rust), insert and get must correctly increment/decrement refcounts through the masked pointer; a bug = double-free / UAF. Figma explicitly cited this as the reason its pointer-tagging experiment wasn't productionized — the benchmark win (~5% RSS) wasn't worth the potential memory-corruption footguns.
  • Debugging / tooling degradation. gdb / lldb / heap profilers interpret raw pointers; a packed u64 shows as garbage. Custom pretty-printers needed.
  • RSS-vs-allocated divergence. Halving allocated bytes (20% theoretical reduction in Figma's case) doesn't necessarily halve resident set size — allocators, page-commit patterns, and OS alignment interact. Figma's 20% allocated-bytes win rendered as only ~5% RSS (concepts/go-runtime-memory-model / same lesson from Datadog Go 1.24). RSS is what the OOM-killer and Kubernetes care about.
  • Not compatible with sanitizers that use the high bits of pointers for shadow-memory tracking (HWASan uses the ARM top-byte explicitly). Choose one.

Ergonomic approach

Encapsulate tagging in a newtype with only safe operations exposed:

#[repr(transparent)]
struct TaggedPointer(u64);

impl TaggedPointer {
    fn new(tag: u16, ptr: *const T) -> Self { ... }
    fn tag(&self) -> u16 { ... }
    fn ptr(&self) -> *const T { ... }
}

Keep the unsafe to the encapsulation layer; callers never see the raw u64. This is how LLVM's PointerIntPair and the Rust tagged-pointer crate ship.

Relationship to adjacent techniques

  • Orthogonal to concepts/small-map-as-sorted-vec — the two compose. Flat-map gives you Vec<(K, V)>; tagged pointer collapses it to Vec<u64> if K + V fit in 64 bits.
  • Sibling of concepts/bitpacking — both exploit spare bits; bitpacking is the quantization use case (pack sub-byte elements into uint8/32 containers), tagged pointer is the pointer use case.
  • Cousin of struct-of-arrays — both sacrifice a clean abstraction for tighter in-memory layout.

Lesson

Pointers carry spare bits on modern CPUs; using them is a classical density trick, but the benchmark win is always bigger than the RSS win because real allocators don't give back savings proportionally. In memory-safe languages with refcounting or GC, the encapsulation cost against pointer-corruption classes is load-bearing — test and fuzz the refcount path before shipping.

Last updated · 200 distilled / 1,178 read