SYSTEM Cited by 1 source
Netflix Ranker¶
Ranker is "one of the largest and most complex services at Netflix" — it powers "the personalized rows you see on the Netflix homepage" and "runs at an enormous scale" (Source: sources/2026-03-03-netflix-optimizing-recommendation-systems-with-jdks-vector-api).
This page is a stub; Netflix hasn't publicly described Ranker's end-to-end architecture on the TechBlog. What the 2026-03-03 post does document is a single feature inside Ranker — video serendipity scoring — and the optimization journey that took it from 7.5% → ~1% of node CPU.
What Ranker does (from the 2026-03-03 post)¶
- Generates the personalized homepage rows for Netflix subscribers.
- For each candidate title, scores it against the member's viewing history across multiple features; the downstream recommendation logic consumes those feature values.
- The scored feature covered in the post is video serendipity: "How different is this new title from what you've been watching so far?" — max cosine similarity between the candidate embedding and each history-item embedding, subtracted from 1 to give a "novelty" score.
Request shape¶
The post reports an asymmetric request distribution that shaped Ranker's optimization priorities:
- ~98% of requests are single-video (one candidate scored at a time).
- ~2% of requests are large batches (many candidates scored together).
- By total video volume processed, the split is roughly 50:50 between single and batch — the 2% of batch requests each carry many candidates, so they account for half the fleet compute cost.
This makes batch-path optimization worthwhile even though it can't move the p50: it halves the fleet cost, not the median latency.
The 2026-03-03 optimization¶
Ranker's serendipity scoring operator was consuming 7.5% of
total CPU on every Ranker node. Netflix drove it down to ~1%
via a five-step sequence:
- Batched matrix multiply. Reshape the
M×Nnested-loop cosine-similarity computation intoC = A × Bᵀ(patterns/batched-matmul-for-pairwise-similarity). - Fix memory layout and allocation. Flat
double[]row-major buffers + per-threadThreadLocal<BufferHolder>with grow-but-never-shrink buffers (patterns/flat-buffer-threadlocal-reuse). - Kernel swap: BLAS. Tried
netlib-java - native BLAS; lost in production to JNI overhead + row/column-major translation costs.
- Kernel swap: JDK Vector API. Pure-Java SIMD via systems/jdk-vector-api with fused multiply-add accumulation; runtime-dispatched with scalar fallback (patterns/runtime-capability-dispatch-pure-java-simd).
Canary + full-rollout results: ~7% drop in CPU utilization, ~12% drop in average latency, ~10% improvement in CPU/RPS.
Stub caveats¶
Not covered on the wiki:
- Ranker's full request pipeline (retrieval / candidate gen / feature compute / ranking model / serving path).
- The ranking model itself (architecture, training, experiment infrastructure).
- How Ranker composes with the other personalization services at Netflix (retrieval, A/B, content signals).
- Per-region deployment, instance counts, fleet CPU absolute numbers.
- How Ranker's features flow from Amber or other feature stores.
Seen in¶
- sources/2026-03-03-netflix-optimizing-recommendation-systems-with-jdks-vector-api — the only Ranker-internal post on the wiki. Covers only the serendipity-scoring hot path and its five-step optimization to the JDK Vector API.
Related¶
- systems/jdk-vector-api — Ranker's chosen SIMD substrate for the batched matmul kernel.
- concepts/vector-embedding — candidate + history items are D-dimensional embeddings.
- concepts/cosine-similarity — the per-pair kernel.
- concepts/flamegraph-profiling — the diagnostic that surfaced serendipity scoring as a top hotspot.
- patterns/batched-matmul-for-pairwise-similarity — the algorithmic reshape.
- patterns/flat-buffer-threadlocal-reuse — the memory-layout enabler.
- patterns/runtime-capability-dispatch-pure-java-simd — the deployment-safety pattern.
- companies/netflix — parent.