CONCEPT Cited by 1 source
Base64 as byte buffer (avoid string materialisation)¶
Definition¶
Base64 as byte buffer is the performance technique of reading Base64-encoded data directly from its file-buffer representation — computing sextet-to-byte decodes on demand during the query operation — rather than first materialising a decoded byte array (or, worse, a decoded string) in memory.
The move matters when:
- Base64 appears in a file embedded in a larger text format
(JSONL, YAML, TOML, HTTP header) and is naturally already
in a
char*file buffer. - The consuming operation queries the decoded bytes sparsely (a Bloom-filter membership test reads one byte per hash function, not the whole buffer).
- The allocation cost of a decoded-byte string dominates the per-byte-decode cost.
Canonical Vercel framing¶
The 2026-04-21 Bloom-filter routing post discloses the technique with a LuaJIT + FFI code snippet:
local b64 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
local decode_table = ffi.new 'uint8_t[256]'
for i = 1, #b64 do
decode_table[str_byte(b64, i)] = i - 1
end
function BloomFilter:has(key)
for byte_offset, bit_offset in self:iterator(key) do
local sextet = decode_table[ptr[byte_offset]]
if band(sextet, lshift(1, bit_offset)) == 0 then
return false
end
end
return true
end
With framing quote:
"Although the Bloom filter is represented in the file as a Base64 string, we don't actually want to treat it as a string. String operations are expensive, and they're the reason why the previous approach is so slow. The goal of this optimization is precisely to avoid them."
And the mechanism:
"Instead, we can ignore the double quotes and treat the Base64 data as the Bloom filter directly. Then, when checking membership, we can decode each byte as needed."
"This means the speed at which we can create a Bloom filter is bound by file reading, which is orders of magnitude faster than string creation, so we can create very large Bloom filters nearly instantly."
(Source: sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters.)
The decoding logic¶
Every 4 Base64 characters encode 3 bytes (24 bits → 4 × 6-bit
sextets). For a membership query that needs bit b of the
underlying bit array:
- Which byte of decoded data?
byte_index = b / 8. - Which bit within that byte?
bit_index = b % 8. - Which sextet(s)? Since 1 decoded byte crosses sextet boundaries (3 bytes = 4 sextets = 24 bits), each decoded byte spans 1 or 2 adjacent sextets. Compute which sextet(s), read the raw Base64 bytes from the file buffer at the corresponding offsets, look up their 6-bit values in the 256-entry decode table, and OR-shift them to reconstruct the decoded byte.
- AND-mask:
(decoded_byte & (1 << bit_index)) != 0.
The code snippet uses a simplified form where each
byte_offset already maps to one Base64 character and
bit_offset picks within the 6-bit sextet — a minor
simplification that holds only if the bit indexing is done in
the 6-bit space rather than the 8-bit decoded space. Either
shape works; the Vercel team chose the simpler one.
Why it wins¶
| Step | String materialisation | Direct buffer query |
|---|---|---|
| File load | Read into a string | Read into a buffer |
| Decode | Allocate (3/4 × size) bytes, fill | no-op |
| Query | Read from decoded byte array | Read from file buffer, decode k bytes |
| Per-query cost | O(1) — already decoded | O(k) sextet decodes (k ≈ number of hash functions) |
| Memory overhead | ~2× (both encoded + decoded in memory) | ~1× (only encoded in memory) |
| GC pressure | High — large string allocation | Minimal — 256-byte decode table + no per-query alloc |
For a 4 MB Base64-encoded filter queried once per request at 23 hash functions:
- String materialisation: ~3 MB allocation per file load. One-time per load but the allocation + GC pressure blocks the event loop (see concepts/event-loop-blocking-single-threaded).
- Direct buffer query: 23 sextet decodes per query. At sub-nanosecond per decode via FFI byte-array lookup, total per-query decode cost is <50 ns.
Vercel's Bloom-filter substitution hit both axes: smaller total file size (Bloom filter vs JSON), and smaller per-query cost (direct buffer query vs materialised-string operations). The cumulative effect is why p99 drops 200×, not just because Bloom filters are smaller.
When to apply¶
- File buffer already exists in the runtime. Don't build a byte buffer specifically to avoid string ops — the file-read allocation is just as expensive.
- Query access pattern is sparse. If you need to access every decoded byte, materialise once and reuse. The technique only wins when the reads are sparse enough that the total decode work is less than the materialisation cost.
- Runtime has cheap FFI or typed-array access. LuaJIT's
FFI is especially efficient. Node.js
Buffer/ typed arrays have similar constant-time-byte-access properties. Python'sbytesobjects are fast;str.decode()is slow. - String ops dominate the runtime's hot path. Vercel's
LuaJIT / OpenResty-style runtime is the canonical case;
similar techniques apply to Node.js
Bufferoperations and Rust's byte-slice access.
Related techniques¶
- Zero-copy parsing: read bytes in-place from the
network buffer, don't materialise intermediate
representations.
sonic-cpp,simd-json,rapidjsonare zero-copy variants of full-tree JSON parsers. - Cap'n Proto / FlatBuffers: binary formats designed so consumers read fields in place from the buffer without deserialisation.
Uint8Array.buffer: in Node.js / V8, backing buffer access without intermediate strings.mmap+ typed-array view: open a file as memory-mapped, overlay a typed-array view — queries read directly from OS page cache without any userspace string allocation.
Pitfalls¶
- Endianness and alignment assumptions break cross-platform. Network byte order is not assumed when you're just decoding Base64; it does matter when the decoded bytes represent multi-byte integers.
- Off-by-one in sextet-to-byte math. Base64 has
=padding for non-multiple-of-3 input; filter construction should pad to avoid querying past the valid range. - Decode table setup overhead. Vercel's 256-byte table is set once at load; per-query cost is a single array-index lookup. If the table is set up per-query (common mistake) the technique loses to string materialisation.
- Mutable file buffers. If the routing service ever mutates the file buffer in place (to, say, update a version field), the Bloom filter's bits get corrupted. Treat the buffer as immutable.
- String materialisation elsewhere in the stack. If a logging middleware or tracing span decodes the buffer for observability, the benefit is lost. Keep observability off the hot decode path.
Seen in¶
- sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters
— Canonical wiki instance. Vercel's routing service loads
a JSONL file where line 2 is the Base64-encoded Bloom-
filter bit array, and the per-request membership test
reads sextets directly from the file buffer via a
LuaJIT FFI
uint8_t[256]decode table — never materialising the decoded byte array. Cited as the load- bearing reason filter construction is "bound by file reading, which is orders of magnitude faster than string creation."