Skip to content

PATTERN Cited by 1 source

JSONL parameters + Base64 payload

Pattern

Serialise a parameterised binary artefact (a Bloom filter, a trained model, a compact index, a compressed cache) as a two-line JSONL file:

  • Line 1: a JSON object containing the parameters needed to interpret the payload — schema version, algorithm parameters, metadata (size, element count, error rates, creation time), anything the consumer needs to know before touching the bytes.
  • Line 2: the binary payload, Base64-encoded and wrapped in JSON string quotes (so the whole file is valid JSONL).

The pattern trades a small (~33 %) size overhead from Base64 encoding for self-description + text-safe embedding + greppability + easy diff-ability in Git / S3 listings + zero tool requirement beyond a JSON parser plus a Base64 decoder.

Canonical Vercel application

Vercel's 2026-04-21 blog post canonicalises the pattern for Bloom-filter transport between their build service and routing service:

{"version":"test","bloom":{"n":10,"p":1e-7,"m":336,"k":23,"s":0}}
"0kxC4anU4awVOYSs54vsAL7gBNGK/PrLjKrAJRil64mMxmiig1S+jqyC"

Line 1 declares everything the consumer needs to construct a query-compatible filter: n = element count, p = target false-positive rate, m = bit-array size, k = number of hash functions, s = seed. Line 2 is the bit array, Base64- encoded, which the consumer can either decode into a byte array or (more cleverly) treat as a byte buffer directly per concepts/base64-as-byte-buffer-inline.

(Source: sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters.)

Why JSONL + Base64

Over pure binary

  • Parseable by anyone without vendor-specific libraries.
  • Greppable in object-storage listingsaws s3 cp | head -1 | jq surfaces parameters without decoding the payload.
  • Self-describing via versioning — a "version":"..." field lets the consumer route to the right parser.
  • Easy to add sibling metadata (build hash, creation time, tenant ID) without changing the format.

Over JSON nested

  • Two-line structure keeps metadata and payload cleanly separable. A consumer that only wants parameters doesn't pay the Base64-decode cost; a consumer that only wants the payload skips line 1.
  • Streaming-friendly — line-delimited framing is a natural parser boundary.
  • Avoids JSON escape-string overhead on the payload. If the binary payload were embedded in line 1's JSON object as a string field, every Base64 character is still just a Base64 character — but the parser must scan the object for the closing brace, which is slower than reading up to the newline.

Over YAML

  • Same expressiveness for line 1 as plain JSON.
  • YAML's multi-document --- separator is an alternative, but YAML parsing is slower and more complex; JSONL is simpler.

Over Protobuf / FlatBuffers / MessagePack

  • No code generation required.
  • Text-editable for debugging — you can read the parameters with cat | head -1.
  • Protobuf etc. win on size and per-record parse speed, but the Vercel use case has one parameter blob and one payload per file — not a large record stream — so parsing cost is a non-issue.

The Base64 size penalty

Base64 encodes 3 bytes of binary as 4 ASCII characters = 33 % size overhead. For Vercel's case:

  • Raw bit-array size: ~4 MB (for 1M paths at p = 1e-7).
  • Base64-encoded: ~5.3 MB.
  • Still 70–80 % smaller than the previous JSON-tree path lookup (which was also text).

The 33 % overhead is absorbed into the pattern's benefits: text-safety, version-ability, embed-in-JSONL-ability, gzip-friendliness (Base64's ~6 bits/char entropy compresses well in HTTP transport).

Variations

Multi-chunked payload

For very large payloads, split into N lines after line 1:

{"version":"1","bloom":{...},"chunks":3,"chunk_size":1048576}
"chunk-1-base64..."
"chunk-2-base64..."
"chunk-3-base64..."

Enables streaming consumption and per-chunk checksums.

Sibling metadata

{"version":"1","bloom":{...},"build_id":"abc","created_at":"2026-04-21T10:00:00Z"}
"base64..."

Parameters in line 1 remain minimal; operational metadata sits alongside in the same object.

Versioning via top-level field

{"schema":"bloom/1.2","params":{...}}
"base64..."

Let the consumer dispatch to different parser versions based on schema.

Migration across versions

Consumer observes "version":"v2", emits a warning if the consumer is only v1-compatible, and falls back to safe default behaviour (in Vercel's case: route to the legacy JSON-tree lookup path). This is how the Vercel roll-out worked — projects that haven't deployed since the Bloom filter shipped still use the older JSON path.

When to use

  • Binary artefact + parameters that are needed to interpret the artefact (Bloom filter, compressed index, encrypted blob + IV).
  • Small number of artefacts per file (1 or a small N). For streaming many records, use native binary formats (Parquet, Avro).
  • Cross-language producer and consumer — the JSONL + Base64 format has near-universal tool support.
  • Object-storage transport — S3, GCS, Azure Blob. The pattern is filesystem-friendly and CDN-cacheable.
  • Human-debuggable preferred over maximum compactness — the 33 % Base64 overhead is the tax for readability.

When not to use

  • Hot serialisation path — tens of thousands of records per second. Use a native binary format.
  • Size-critical on-wire — Protobuf + gzip is more compact.
  • Binary-only ecosystems (embedded, high-performance networking) — no benefit from text framing.

Anti-patterns

  • Parameters inside the Base64 payload. Defeats the self-description property. Consumer must decode before learning what it's looking at.
  • Mixing JSON and binary at the byte level (e.g. JSON followed by raw bytes without the Base64 wrapping and quotes). Breaks parser assumptions; can't be viewed with cat without terminal corruption.
  • Not versioning line 1. Future schema evolution becomes a migration archaeology exercise.
  • Decoding the Base64 into a materialised string on the hot path — see concepts/base64-as-byte-buffer-inline for the zero-alloc alternative.

Seen in

Last updated · 476 distilled / 1,218 read