Skip to content

CONCEPT Cited by 1 source

Cosine similarity

Cosine similarity is the cosine of the angle between two vectors:

cos(A, B) = (A · B) / (‖A‖ × ‖B‖)

It measures orientation, not magnitude — two vectors pointing in the same direction are maximally similar (score = 1) regardless of their lengths, opposite-pointing vectors score −1, and orthogonal vectors score 0. For unit-normalised embeddings (‖A‖ = ‖B‖ = 1), cosine similarity collapses to a plain dot product A · B — a single sum-of-products that vector-SIMD hardware executes in a handful of FMA instructions.

Why it dominates recommendation / retrieval workloads

  • Embedding models are typically trained so that semantic similarity corresponds to small angles between embeddings (concepts/vector-embedding). Cosine is the native metric.
  • The cheap form (pre-normalise once, then dot-product at query time) is trivially vectorisable and batches into a matrix multiply: a full M × N cosine-similarity sweep is C = A × Bᵀ where A is M × D candidates and B is N × D history items (patterns/batched-matmul-for-pairwise-similarity).
  • Cosine is the default metric for vector-similarity-search engines (FAISS, HNSW, pgvector, S3 Vectors); see concepts/vector-similarity-search.

Cosine vs inner product vs Euclidean

Metric Formula Use when
Cosine A·B / (‖A‖‖B‖) orientation matters, magnitudes aren't calibrated
Inner product A·B embedding model trained for IP; magnitudes encode salience
Euclidean (L2) ‖A − B‖ metric space distances (clustering)

Mismatch between the metric the embedding was trained against and the metric used at query time materially reduces recall — select the model's recommended metric.

Seen in

  • sources/2026-03-03-netflix-optimizing-recommendation-systems-with-jdks-vector-api — Netflix Ranker's video serendipity scoring computes per- candidate max cosine(candidate, history[i]) across the viewing history and returns 1 − max as a "novelty" feature. The naive O(M×N) nested-loop implementation cost 7.5% of total Ranker node CPU; reshaping into a batched matrix multiply with unit- normalised embeddings + JDK-Vector-API-FMA kernel drove it to ~1%.
Last updated · 319 distilled / 1,201 read