Skip to content

CONCEPT Cited by 1 source

Base64 as byte buffer (avoid string materialisation)

Definition

Base64 as byte buffer is the performance technique of reading Base64-encoded data directly from its file-buffer representation — computing sextet-to-byte decodes on demand during the query operation — rather than first materialising a decoded byte array (or, worse, a decoded string) in memory.

The move matters when:

  • Base64 appears in a file embedded in a larger text format (JSONL, YAML, TOML, HTTP header) and is naturally already in a char* file buffer.
  • The consuming operation queries the decoded bytes sparsely (a Bloom-filter membership test reads one byte per hash function, not the whole buffer).
  • The allocation cost of a decoded-byte string dominates the per-byte-decode cost.

Canonical Vercel framing

The 2026-04-21 Bloom-filter routing post discloses the technique with a LuaJIT + FFI code snippet:

local b64 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
local decode_table = ffi.new 'uint8_t[256]'
for i = 1, #b64 do
  decode_table[str_byte(b64, i)] = i - 1
end

function BloomFilter:has(key)
  for byte_offset, bit_offset in self:iterator(key) do
    local sextet = decode_table[ptr[byte_offset]]
    if band(sextet, lshift(1, bit_offset)) == 0 then
      return false
    end
  end
  return true
end

With framing quote:

"Although the Bloom filter is represented in the file as a Base64 string, we don't actually want to treat it as a string. String operations are expensive, and they're the reason why the previous approach is so slow. The goal of this optimization is precisely to avoid them."

And the mechanism:

"Instead, we can ignore the double quotes and treat the Base64 data as the Bloom filter directly. Then, when checking membership, we can decode each byte as needed."

"This means the speed at which we can create a Bloom filter is bound by file reading, which is orders of magnitude faster than string creation, so we can create very large Bloom filters nearly instantly."

(Source: sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters.)

The decoding logic

Every 4 Base64 characters encode 3 bytes (24 bits → 4 × 6-bit sextets). For a membership query that needs bit b of the underlying bit array:

  1. Which byte of decoded data? byte_index = b / 8.
  2. Which bit within that byte? bit_index = b % 8.
  3. Which sextet(s)? Since 1 decoded byte crosses sextet boundaries (3 bytes = 4 sextets = 24 bits), each decoded byte spans 1 or 2 adjacent sextets. Compute which sextet(s), read the raw Base64 bytes from the file buffer at the corresponding offsets, look up their 6-bit values in the 256-entry decode table, and OR-shift them to reconstruct the decoded byte.
  4. AND-mask: (decoded_byte & (1 << bit_index)) != 0.

The code snippet uses a simplified form where each byte_offset already maps to one Base64 character and bit_offset picks within the 6-bit sextet — a minor simplification that holds only if the bit indexing is done in the 6-bit space rather than the 8-bit decoded space. Either shape works; the Vercel team chose the simpler one.

Why it wins

Step String materialisation Direct buffer query
File load Read into a string Read into a buffer
Decode Allocate (3/4 × size) bytes, fill no-op
Query Read from decoded byte array Read from file buffer, decode k bytes
Per-query cost O(1) — already decoded O(k) sextet decodes (k ≈ number of hash functions)
Memory overhead ~2× (both encoded + decoded in memory) ~1× (only encoded in memory)
GC pressure High — large string allocation Minimal — 256-byte decode table + no per-query alloc

For a 4 MB Base64-encoded filter queried once per request at 23 hash functions:

  • String materialisation: ~3 MB allocation per file load. One-time per load but the allocation + GC pressure blocks the event loop (see concepts/event-loop-blocking-single-threaded).
  • Direct buffer query: 23 sextet decodes per query. At sub-nanosecond per decode via FFI byte-array lookup, total per-query decode cost is <50 ns.

Vercel's Bloom-filter substitution hit both axes: smaller total file size (Bloom filter vs JSON), and smaller per-query cost (direct buffer query vs materialised-string operations). The cumulative effect is why p99 drops 200×, not just because Bloom filters are smaller.

When to apply

  • File buffer already exists in the runtime. Don't build a byte buffer specifically to avoid string ops — the file-read allocation is just as expensive.
  • Query access pattern is sparse. If you need to access every decoded byte, materialise once and reuse. The technique only wins when the reads are sparse enough that the total decode work is less than the materialisation cost.
  • Runtime has cheap FFI or typed-array access. LuaJIT's FFI is especially efficient. Node.js Buffer / typed arrays have similar constant-time-byte-access properties. Python's bytes objects are fast; str.decode() is slow.
  • String ops dominate the runtime's hot path. Vercel's LuaJIT / OpenResty-style runtime is the canonical case; similar techniques apply to Node.js Buffer operations and Rust's byte-slice access.
  • Zero-copy parsing: read bytes in-place from the network buffer, don't materialise intermediate representations. sonic-cpp, simd-json, rapidjson are zero-copy variants of full-tree JSON parsers.
  • Cap'n Proto / FlatBuffers: binary formats designed so consumers read fields in place from the buffer without deserialisation.
  • Uint8Array.buffer: in Node.js / V8, backing buffer access without intermediate strings.
  • mmap + typed-array view: open a file as memory-mapped, overlay a typed-array view — queries read directly from OS page cache without any userspace string allocation.

Pitfalls

  • Endianness and alignment assumptions break cross-platform. Network byte order is not assumed when you're just decoding Base64; it does matter when the decoded bytes represent multi-byte integers.
  • Off-by-one in sextet-to-byte math. Base64 has = padding for non-multiple-of-3 input; filter construction should pad to avoid querying past the valid range.
  • Decode table setup overhead. Vercel's 256-byte table is set once at load; per-query cost is a single array-index lookup. If the table is set up per-query (common mistake) the technique loses to string materialisation.
  • Mutable file buffers. If the routing service ever mutates the file buffer in place (to, say, update a version field), the Bloom filter's bits get corrupted. Treat the buffer as immutable.
  • String materialisation elsewhere in the stack. If a logging middleware or tracing span decodes the buffer for observability, the benefit is lost. Keep observability off the hot decode path.

Seen in

  • sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters — Canonical wiki instance. Vercel's routing service loads a JSONL file where line 2 is the Base64-encoded Bloom- filter bit array, and the per-request membership test reads sextets directly from the file buffer via a LuaJIT FFI uint8_t[256] decode table — never materialising the decoded byte array. Cited as the load- bearing reason filter construction is "bound by file reading, which is orders of magnitude faster than string creation."
Last updated · 476 distilled / 1,218 read