SYSTEM Cited by 1 source

Netflix Ranker¶

Ranker is "one of the largest and most complex services at Netflix" — it powers "the personalized rows you see on the Netflix homepage" and "runs at an enormous scale" (Source: sources/2026-03-03-netflix-optimizing-recommendation-systems-with-jdks-vector-api).

This page is a stub; Netflix hasn't publicly described Ranker's end-to-end architecture on the TechBlog. What the 2026-03-03 post does document is a single feature inside Ranker — video serendipity scoring — and the optimization journey that took it from 7.5% → ~1% of node CPU.

What Ranker does (from the 2026-03-03 post)¶

Generates the personalized homepage rows for Netflix subscribers.
For each candidate title, scores it against the member's viewing history across multiple features; the downstream recommendation logic consumes those feature values.
The scored feature covered in the post is video serendipity: "How different is this new title from what you've been watching so far?" — max cosine similarity between the candidate embedding and each history-item embedding, subtracted from 1 to give a "novelty" score.

Request shape¶

The post reports an asymmetric request distribution that shaped Ranker's optimization priorities:

~98% of requests are single-video (one candidate scored at a time).
~2% of requests are large batches (many candidates scored together).
By total video volume processed, the split is roughly 50:50 between single and batch — the 2% of batch requests each carry many candidates, so they account for half the fleet compute cost.

This makes batch-path optimization worthwhile even though it can't move the p50: it halves the fleet cost, not the median latency.

The 2026-03-03 optimization¶

Ranker's serendipity scoring operator was consuming 7.5% of total CPU on every Ranker node. Netflix drove it down to ~1% via a five-step sequence:

Batched matrix multiply. Reshape the M×N nested-loop cosine-similarity computation into C = A × Bᵀ (patterns/batched-matmul-for-pairwise-similarity).
Fix memory layout and allocation. Flat double[] row-major buffers + per-thread ThreadLocal<BufferHolder> with grow-but-never-shrink buffers (patterns/flat-buffer-threadlocal-reuse).
Kernel swap: BLAS. Tried netlib-java
native BLAS; lost in production to JNI overhead + row/column-major translation costs.
Kernel swap: JDK Vector API. Pure-Java SIMD via systems/jdk-vector-api with fused multiply-add accumulation; runtime-dispatched with scalar fallback (patterns/runtime-capability-dispatch-pure-java-simd).

Canary + full-rollout results: ~7% drop in CPU utilization, ~12% drop in average latency, ~10% improvement in CPU/RPS.

Stub caveats¶

Not covered on the wiki:

Ranker's full request pipeline (retrieval / candidate gen / feature compute / ranking model / serving path).
The ranking model itself (architecture, training, experiment infrastructure).
How Ranker composes with the other personalization services at Netflix (retrieval, A/B, content signals).
Per-region deployment, instance counts, fleet CPU absolute numbers.
How Ranker's features flow from Amber or other feature stores.

Seen in¶

sources/2026-03-03-netflix-optimizing-recommendation-systems-with-jdks-vector-api — the only Ranker-internal post on the wiki. Covers only the serendipity-scoring hot path and its five-step optimization to the JDK Vector API.

systems/jdk-vector-api — Ranker's chosen SIMD substrate for the batched matmul kernel.
concepts/vector-embedding — candidate + history items are D-dimensional embeddings.
concepts/cosine-similarity — the per-pair kernel.
concepts/flamegraph-profiling — the diagnostic that surfaced serendipity scoring as a top hotspot.
patterns/batched-matmul-for-pairwise-similarity — the algorithmic reshape.
patterns/flat-buffer-threadlocal-reuse — the memory-layout enabler.
patterns/runtime-capability-dispatch-pure-java-simd — the deployment-safety pattern.
companies/netflix — parent.