Skip to content

VERCEL 2026-04-21 Tier 3

Read original ↗

Vercel — How we made global routing faster with Bloom filters

Summary

Vercel's global routing service — the single-threaded front door that decides whether to serve, rewrite, or 404 every incoming request to every deployment — had a per-deployment tax proportional to the number of build-output paths. Small sites paid ~1 ms to parse their path-lookup JSON. A long tail of large sites (e-commerce catalogues, documentation sites, dynamic-routing apps) generated 1.5+ megabyte path-lookup files that took ~100 ms at p99 and ~250 ms at p99.9 to parse. Because the routing service is single-threaded, the parsing work blocked the event loop: 250 ms of parsing meant 250 ms of blocked request handling for every request routed during that parse.

The fix replaced the JSON path tree with a Bloom filter. The build service generates the filter from all deployment paths and uploads a JSONL file: line one is a JSON object with the Bloom filter parameters (n, p, m, k, s); line two is the Base64-encoded Bloom-filter bit array. The routing service reads the file and treats the Base64 string as the byte buffer directly — decoding sextets on demand during membership queries rather than copying into a string.

Measured results: path-lookup p99 dropped from ~100 ms to ~0.5 ms (200× faster); p999 dropped from ~250 ms to ~2.4 ms (~100× faster); total routing-service heap and memory usage dropped by 15 %; TTFB from p75 to p99 across all routed requests improved by 10 %. The Bloom filter is also 70-80 % smaller than the JSON file even Base64-encoded, shortening deployment upload time and routing-service fetch time.

Key takeaways

  • Bloom filters are the correct shape for path membership because the problem tolerates false positives (extra storage lookup that 404s cheaply) but cannot tolerate false negatives (returning 404 for a real page, an SEO and availability bug) (Source: sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters). Canonicalised as concepts/false-positive-vs-false-negative-asymmetry.

  • The expensive operation was JSON parsing, not lookup. p99 path lookup on a 1.5 MB JSON file was ~100 ms of parse; per-key membership against the parsed tree was fast. Replacing the substrate (tree → Bloom filter) flipped the cost model: construction is a file-read + bit-array allocation, and per-key lookup is a handful of hash computations. This is the load-bearing asymmetry behind the 200× p99 improvement.

  • Single-threaded services compound parse cost. "Given that our routing service is single-threaded, parsing this JSON file also blocks the event loop. This means that for those websites whose path lookups take 250 milliseconds to parse, it literally takes 250 milliseconds longer to serve the website while we wait for the operation to finish." Canonicalised as concepts/event-loop-blocking-single-threaded.

  • Ripple effects on GC and heap. "It turns out that path lookup for those few heavy websites was disproportionately hogging our routing service's memory and CPU, which was making it run more slowly across the board. Once we shipped this improvement to path lookup, the heap size and memory usage of the routing service dropped by 15%. The reduced heap size relieved garbage collection pressure, a primary bottleneck in JSON parsing, so parsing the old path lookup file (which is still done for projects that haven't deployed since we rolled out the Bloom filter) also sped up dramatically at all percentiles." The Bloom-filter rollout helped every site, not just the heavy ones, because the heavy sites had been stealing reactor time from everyone.

  • Two-service Bloom-filter compatibility requires identical-by-construction algorithms. The build service (which generates the filter) and the routing service (which queries it) run in different languages. Vercel implemented matching Bloom-filter logic in both codebases so that a filter generated by one is byte-for-byte query-compatible in the other.

  • JSONL carries parameters + payload in one file. Line 1 is a JSON object with Bloom-filter parameters (n elements, p desired false-positive rate, m bit-array size, k hash functions, s hash seed). Line 2 is the Base64-encoded bit array. This is a specialisation of the general pattern of shipping "header JSON + payload blob" in a single file with line-delimited JSON boundaries. Canonicalised as patterns/jsonl-parameters-plus-base64-payload.

  • Treat Base64 as byte buffer; never realise the string. The code snippet discloses an LuaJIT ffi.new 'uint8_t[256]' decode table and a membership test that reads bytes directly from the file buffer (ptr[byte_offset]), decodes a sextet, and AND-masks the target bit. "String operations are expensive, and they're the reason why the previous approach is so slow. The goal of this optimization is precisely to avoid them." Canonicalised as concepts/base64-as-byte-buffer-inline.

  • Bloom-filter construction is I/O-bound, not CPU-bound. "This means the speed at which we can create a Bloom filter is bound by file reading, which is orders of magnitude faster than string creation, so we can create very large Bloom filters nearly instantly." The false dichotomy that motivated the old design — "we must spend proportional time on big sites" — disappears once the bit-array representation lets construction ride on raw file-read throughput.

  • Defence against enumeration attacks is a second-order reason the routing service checks membership before fetching storage: "This prevents unnecessary requests to storage and protects against enumeration attacks, where attackers try to discover hidden files by guessing URLs." Canonicalised as concepts/path-enumeration-attack.

Canonical operational numbers

Metric Before (JSON) After (Bloom filter) Ratio
Path lookup p50 under 1 ms < 1 ms ~1×
Path lookup p90 ~4 ms < 1 ms ~4×
Path lookup p99 ~100 ms ~0.5 ms ~200×
Path lookup p999 ~250 ms ~2.4 ms ~100×
Routing service heap / memory baseline baseline − 15 %
TTFB p75 – p99 (all routes) baseline baseline × 0.9
Path-lookup file size (Base64 encoded) 1.5+ MB worst case 70-80 % smaller ~4-5×

Trigger corpus: "e-commerce sites with large product catalogs; documentation sites with thousands of pages; applications with dynamic routing."

Bloom-filter parameters disclosed

Verbatim example line from a generated file:

{"version":"test","bloom":{"n":10,"p":1e-7,"m":336,"k":23,"s":0}}
"0kxC4anU4awVOYSs54vsAL7gBNGK/PrLjKrAJRil64mMxmiig1S+jqyC"
  • n = number of elements (10 in the test vector; much larger for production path lists).
  • p = target false-positive rate (1e-7 = one false-positive per 10 million negative queries).
  • m = size of the bit array (336 bits for this test vector).
  • k = number of hash functions (23 — the post uses hash- multiplication: one cryptographic hash seeded differently k times).
  • s = seed for the first hash function.

The production-scale sizing is not disclosed explicitly, but the 70-80 % file-size reduction from megabyte-scale JSON implies m in the low-megabits range (matching the million- path worst case with p ≤ 1e-7).

Architecture: the two halves

┌────────────────────────────────┐         ┌────────────────────────────────┐
│      Build service             │  JSONL  │      Routing service           │
│                                │  file   │                                │
│  - Collect all deployment      │  over   │  - Fetch file per cold start   │
│    paths (static assets,       │  S3     │    or per deployment update    │
│    pages, API routes,          │────────▶│  - Treat Base64 as byte        │
│    webpack chunks, Next.js     │         │    buffer (no decode-to-string)│
│    route segments)             │         │  - On request, hash path k     │
│  - Build Bloom filter          │         │    times, AND-mask bits        │
│  - Serialise to JSONL          │         │  - If any 0 → 404 (negative)   │
│  - Upload                      │         │  - If all 1 → proceed to       │
│                                │         │    storage fetch (positive)    │
└────────────────────────────────┘         └────────────────────────────────┘
       offline / async                          online / single-threaded
       languages differ (Go? Rust?)             languages differ (likely Lua/OpenResty
                                                given the LuaJIT + FFI code sample)

The two services implement matching Bloom-filter algorithms — same hash function family, same bit-layout, same parameter interpretation — so that a filter generated on one side is query-compatible on the other. This cross-service-compat burden is the only downside the post names.

Caveats and undisclosed details

  • Languages not named. The code sample is LuaJIT with FFI, suggesting the routing service runs on an OpenResty-style nginx+Lua stack. The build service's language is not disclosed. Vercel does not claim a cross-language Bloom- filter library; both sides were hand-implemented.
  • Hash family not disclosed. The post refers to "a cryptographic hash" seeded k times but does not name the specific hash (SHA-256? BLAKE3? xxhash?). False-positive rate is a function of hash-quality as well as m / k.
  • No cold-start vs warm cost data. "cold" filter construction is claimed "nearly instant"; specific cold- start latency distribution not disclosed.
  • False-positive rate at real-world scale undisclosed. p = 1e-7 is the test-vector target; actual production p not disclosed. Every false positive costs a storage roundtrip that 404s — cheap but not free.
  • No comparison against alternative data structures. Cuckoo filters (similar false-positive semantics, support deletion, slightly smaller for same p), xor filters (smaller, no insertion after build), and tries with path compression were not considered in the post.
  • Staged rollout details missing. The post mentions "projects that haven't deployed since we rolled out the Bloom filter" still parse the old JSON, implying a per- project opt-in at build time. No timeline for full rollout or deprecation of the JSON path.
  • No measurement of build-service side. Filter generation cost at the build service is claimed negligible but not measured.
  • The 15 % memory reduction is a global-routing-service number, not per-deployment. Whether the reduction is uniform across deployments or concentrated in the heavy-site ones is not split out.
  • TTFB 10 % improvement is aggregate (p75–p99) across all routed requests. The heavy-site improvement is presumably much larger; the all-request aggregate is diluted by the (majority) small sites that were already fast.

Scope disposition

Tier-3 on-scope decisively on engineering-deep-dive grounds. Vercel is Tier-3 (stricter content filter — product-launch / marketing dominates their blog), but this post is a genuine production-optimisation retrospective with:

  • Mechanism-level disclosure (Bloom-filter parameters, JSONL file layout, LuaJIT FFI snippet, two-service coordination requirement).
  • Canonical operational numbers (200× p99 improvement, 100× p999, 15 % memory reduction, 10 % TTFB improvement — all concrete and measurable).
  • Architectural-trade-off framing (false-positive / false- negative asymmetry, construction cost vs parse cost, single- threaded event-loop blast radius).
  • First-principles reasoning rather than product-feature listicle.

Architecture density well above 60 % of body. Fifth Vercel ingest; opens the Vercel routing-service infrastructure axis — prior Vercel coverage spans SEO / rendering (2024-08-01), agent-reliability (2026-01-08 v0), bot-management (2026-04-21 BotID), runtime / platform (2026-04-21 Bun), knowledge-agent (2026-04-21 Knowledge Agent Template), and chat adapters (2026-04-21 Chat SDK). This ingest opens a seventh axis on Vercel's core edge-routing substrate.

Cross-source continuity

  • Complementary to the 2026-04-21 Bun launch — both ingests are engineering-response posts to user-observed latency problems. Bun post profiles Web-Streams transform overhead in Node.js SSR; this post profiles JSON-parse overhead in the single-threaded routing service. Both disclose garbage-collection as dominant secondary cost.
  • Extends concepts/tail-latency-at-scale with a production instance where p99 parse time on a small fraction of heavy traffic dominated aggregate latency — a canonical "slow-is-failure" datum. The 10 % aggregate TTFB improvement is evidence that tail-latency-on-one-axis really does degrade baseline-latency-on-all-axes under single-threaded scheduling.
  • Adjacent to patterns/trimmed-automaton-predicate-filter and patterns/two-stage-evaluation — both existing wiki patterns reference Bloom filters at the pruning / pre-filter altitude without a canonical concept page. This ingest fills that gap with concepts/bloom-filter and concepts/probabilistic-data-structure as canonical anchors.
  • Opens systems/vercel-routing-service as a named substrate on the wiki. Prior Vercel system coverage named Edge Functions, Functions, Fluid compute, Sandbox, Workflow, AI Gateway, BotID, Chat SDK but never the routing service itself (the tier that decides whether the request reaches any of those).
  • Sibling to sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process at the routing-service altitude: that post measured downstream consequences of Vercel's routing + rendering (Google's crawl-to-render-to-index pipeline); this post measures upstream performance of the same routing tier.

Source

Last updated · 476 distilled / 1,218 read