CONCEPT Cited by 2 sources
Small-map-as-sorted-vec¶
Represent an associative container whose key count is empirically
small and bounded as a flat sorted Vec<(key, value)>
(a.k.a. flat map, sorted vector map) rather than a balanced-tree
map or hash map. Theoretically worse on asymptotic insert (O(n) shift
vs O(log n) tree rotation), practically faster at small N on modern
CPUs because the entire working set fits in a handful of cache lines,
amortised allocation overhead is zero, and pointer-chasing is
eliminated. The bound that justifies the choice is empirical, not
theoretical — it comes from the product/schema shape, not the
container's capacity.
When it applies¶
Three conditions must hold simultaneously:
- Small N. Average entry count per instance is small (typically tens; certainly < 1,000). At large N, tree structures' asymptotic wins dominate cache-locality gains.
- Bounded by product shape, not runtime data. The upper bound comes from something that evolves slowly — a schema, an enum, a config table. Not runtime-variable user input.
- Read-dominated or append-dominated hot path. If the workload inserts/removes randomly in the middle, the O(n) shift cost resurfaces. Common fits: deserialization (entries arrive sorted), configuration (loaded once, read many times), fixed-schema metadata.
If any of those fail, use a tree-based map (BTreeMap) or hash map
(HashMap) as usual.
Why it wins at small N¶
- Cache locality. A 64-entry
Vec<(u16, u64)>is ~640 bytes — ~10 cache lines, often block-prefetched. ABTreeMap<u16, u64>of the same size scatters across per-node allocations with pointer-chased traversal. - Zero per-entry allocation. One contiguous allocation for the whole map, vs one allocation per internal tree node.
- Branch predictability. Binary search over a dense array is branch-predictable and SIMD-friendly; tree traversal is not.
- Smaller per-entry footprint. A B-tree node has slack bits for
child pointers and entry counts; a
Vec<(K, V)>entry is exactlysizeof(K) + sizeof(V)plus alignment padding.
Canonical instances¶
- Figma Multiplayer
property map (canonical; see
sources/2026-04-21-figma-supporting-faster-file-load-times-memory-optimizations-rust):
per-node
Map<u16_property_id, u64_pointer>replaced Rust'sBTreeMapwith a flat sortedVec<(u16, u64)>; average entry count ~60, schema-bounded < 200; ~25% memory drop on large files, deserialization got faster despite the asymptotic regression. - Boost.Container
flat_map/flat_set— C++ standard-library-compatible fast-lookup implementation of this pattern; long-established in the C++ community. - Android system UI uses
SparseArray
for
int → Objectmaps specifically to avoid autoboxing overhead on small N. - Rust crate
smallmap/indexmap— idiomatic Rust implementations.
Boundary and failure modes¶
- Don't use if inserts/deletes are random and frequent. O(n) shift cost returns. Measure the workload, not just the size.
- Don't extrapolate from microbenchmarks. The cache-locality win depends on real allocator behaviour and CPU cache hierarchy of the target platform; cross-platform (ARM-Linux vs x86-Linux vs macOS) profiles can differ.
- Resizing cost. The
Vecreallocates on growth past capacity; aBTreeMapdoesn't. Pre-size if possible. - No iterator stability across mutation. Inserting anywhere may
shift all subsequent entries.
BTreeMapiterators are more robust to concurrent mutation.
Relationship to adjacent techniques¶
- Composes with concepts/tagged-pointer. Once you have a
flat
Vec<(K, V)>, tagged pointers collapse it further toVec<packed_u64>ifsizeof(K) + sizeof(V_pointer)≤ 64 bits. - Structural cousin of struct-of-arrays. Both trade abstraction for contiguous-layout performance; columnar is typically chosen for scan workloads, flat-map for lookup workloads.
- Not the same as a hash table with open addressing — open-addressed hash tables also exploit cache locality but pay collision-resolution cost on lookup; flat maps pay binary-search cost. Open-addressed hashing wins when the key domain is large + unstructured; flat maps win when the key domain is small + structured.
Lesson¶
Asymptotic analysis lies at small N on modern hardware. The theoretically-worse container can be practically-faster when the entire working set fits in cache and allocator overhead dominates algorithmic overhead. Always measure. The bound that matters is the empirical distribution of the workload, not the container's worst-case capacity.
Sibling in the wiki¶
systems/trie-hard is the same lesson at a different substrate.
Cloudflare's systems/pingora-origin clear_internal_headers
replaced HashMap<HeaderName> with a custom trie whose entire node
graph lives in one contiguous memory region and whose edges are
bit-packed into unsigned integers. Cache-locality over
asymptotic-complexity again: 1.53 µs → 0.93 µs, 1.28 % of 40,000-core
CPU saved (~550 cores). Both cases hinge on a schema-bounded small N
and a known workload shape; both shipped as custom structures because
general-purpose containers didn't match the target. See
sources/2024-09-10-cloudflare-a-good-day-to-trie-hard.