PATTERN Cited by 1 source
JSONL parameters + Base64 payload¶
Pattern¶
Serialise a parameterised binary artefact (a Bloom filter, a trained model, a compact index, a compressed cache) as a two-line JSONL file:
- Line 1: a JSON object containing the parameters needed to interpret the payload — schema version, algorithm parameters, metadata (size, element count, error rates, creation time), anything the consumer needs to know before touching the bytes.
- Line 2: the binary payload, Base64-encoded and wrapped in JSON string quotes (so the whole file is valid JSONL).
The pattern trades a small (~33 %) size overhead from Base64 encoding for self-description + text-safe embedding + greppability + easy diff-ability in Git / S3 listings + zero tool requirement beyond a JSON parser plus a Base64 decoder.
Canonical Vercel application¶
Vercel's 2026-04-21 blog post canonicalises the pattern for Bloom-filter transport between their build service and routing service:
{"version":"test","bloom":{"n":10,"p":1e-7,"m":336,"k":23,"s":0}}
"0kxC4anU4awVOYSs54vsAL7gBNGK/PrLjKrAJRil64mMxmiig1S+jqyC"
Line 1 declares everything the consumer needs to construct a
query-compatible filter: n = element count, p = target
false-positive rate, m = bit-array size, k = number of
hash functions, s = seed. Line 2 is the bit array, Base64-
encoded, which the consumer can either decode into a byte
array or (more cleverly) treat as a byte buffer directly per
concepts/base64-as-byte-buffer-inline.
(Source: sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters.)
Why JSONL + Base64¶
Over pure binary¶
- Parseable by anyone without vendor-specific libraries.
- Greppable in object-storage listings —
aws s3 cp | head -1 | jqsurfaces parameters without decoding the payload. - Self-describing via versioning — a
"version":"..."field lets the consumer route to the right parser. - Easy to add sibling metadata (build hash, creation time, tenant ID) without changing the format.
Over JSON nested¶
- Two-line structure keeps metadata and payload cleanly separable. A consumer that only wants parameters doesn't pay the Base64-decode cost; a consumer that only wants the payload skips line 1.
- Streaming-friendly — line-delimited framing is a natural parser boundary.
- Avoids JSON escape-string overhead on the payload. If the binary payload were embedded in line 1's JSON object as a string field, every Base64 character is still just a Base64 character — but the parser must scan the object for the closing brace, which is slower than reading up to the newline.
Over YAML¶
- Same expressiveness for line 1 as plain JSON.
- YAML's multi-document
---separator is an alternative, but YAML parsing is slower and more complex; JSONL is simpler.
Over Protobuf / FlatBuffers / MessagePack¶
- No code generation required.
- Text-editable for debugging — you can read the
parameters with
cat | head -1. - Protobuf etc. win on size and per-record parse speed, but the Vercel use case has one parameter blob and one payload per file — not a large record stream — so parsing cost is a non-issue.
The Base64 size penalty¶
Base64 encodes 3 bytes of binary as 4 ASCII characters = 33 % size overhead. For Vercel's case:
- Raw bit-array size: ~4 MB (for 1M paths at
p = 1e-7). - Base64-encoded: ~5.3 MB.
- Still 70–80 % smaller than the previous JSON-tree path lookup (which was also text).
The 33 % overhead is absorbed into the pattern's benefits: text-safety, version-ability, embed-in-JSONL-ability, gzip-friendliness (Base64's ~6 bits/char entropy compresses well in HTTP transport).
Variations¶
Multi-chunked payload¶
For very large payloads, split into N lines after line 1:
{"version":"1","bloom":{...},"chunks":3,"chunk_size":1048576}
"chunk-1-base64..."
"chunk-2-base64..."
"chunk-3-base64..."
Enables streaming consumption and per-chunk checksums.
Sibling metadata¶
Parameters in line 1 remain minimal; operational metadata sits alongside in the same object.
Versioning via top-level field¶
Let the consumer dispatch to different parser versions
based on schema.
Migration across versions¶
Consumer observes "version":"v2", emits a warning if the
consumer is only v1-compatible, and falls back to safe
default behaviour (in Vercel's case: route to the legacy
JSON-tree lookup path). This is how the Vercel roll-out
worked — projects that haven't deployed since the Bloom
filter shipped still use the older JSON path.
When to use¶
- Binary artefact + parameters that are needed to interpret the artefact (Bloom filter, compressed index, encrypted blob + IV).
- Small number of artefacts per file (1 or a small N). For streaming many records, use native binary formats (Parquet, Avro).
- Cross-language producer and consumer — the JSONL + Base64 format has near-universal tool support.
- Object-storage transport — S3, GCS, Azure Blob. The pattern is filesystem-friendly and CDN-cacheable.
- Human-debuggable preferred over maximum compactness — the 33 % Base64 overhead is the tax for readability.
When not to use¶
- Hot serialisation path — tens of thousands of records per second. Use a native binary format.
- Size-critical on-wire — Protobuf + gzip is more compact.
- Binary-only ecosystems (embedded, high-performance networking) — no benefit from text framing.
Anti-patterns¶
- Parameters inside the Base64 payload. Defeats the self-description property. Consumer must decode before learning what it's looking at.
- Mixing JSON and binary at the byte level (e.g. JSON
followed by raw bytes without the Base64 wrapping and
quotes). Breaks parser assumptions; can't be viewed with
catwithout terminal corruption. - Not versioning line 1. Future schema evolution becomes a migration archaeology exercise.
- Decoding the Base64 into a materialised string on the hot path — see concepts/base64-as-byte-buffer-inline for the zero-alloc alternative.
Seen in¶
- sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters
— Canonical wiki instance. Vercel's build service emits a
two-line JSONL file per deployment: line 1 is the Bloom-
filter parameters (
n,p,m,k,s); line 2 is the Base64-encoded bit array. Routing service consumes the file, reads parameters from line 1, treats line 2 as a byte buffer per concepts/base64-as-byte-buffer-inline, and queries the filter on every request.