Skip to content

CONCEPT Cited by 3 sources

Vector Similarity Search

Vector similarity search is the retrieval primitive behind semantic search, recommendation, and RAG: given a query vector and a corpus of vectors (usually embeddings of documents / images / audio), return the top-K corpus vectors closest to the query under a chosen distance metric.

(Source: sources/2025-07-16-aws-amazon-s3-vectors-preview-launch)

Channy Yun's framing (AWS, 2025)

"Vector search is an emerging technique used in generative AI applications to find similar data points to given data by comparing their vector representations using distance or similarity metrics."

Distance metrics (as supported at S3 Vectors launch)

Metric Formula Typical use
Cosine 1 - (a·b) / (‖a‖ ‖b‖) Most NLP embeddings (BERT / Titan / OpenAI) — direction matters, magnitude doesn't
Euclidean (L2) ‖a - b‖₂ Embeddings trained with L2 loss, some image models

S3 Vectors exposes Cosine or Euclidean per index, chosen at CreateVectorIndex time. Other common metrics (inner product / dot product, Manhattan, Hamming) are not listed at preview.

"For Distance metric, you can choose either Cosine or Euclidean. When creating vector embeddings, select your embedding model's recommended distance metric for more accurate results."

Using the wrong metric for a model produces materially worse recall.

Query shape

A query specifies:

  1. The query vector (same dimensionality as the index).
  2. top-K — how many nearest neighbours to return.
  3. Optional filter over vector metadata (e.g. genre="scifi").
  4. Optional flags to return distances / metadata with the hits.

Worked S3 Vectors example:

s3vectors.query_vectors(
    vectorBucketName="...", indexName="...",
    queryVector={"float32": embedding},
    topK=3, filter={"genre": "scifi"},
    returnDistance=True, returnMetadata=True)

Exact vs Approximate (ANN)

  • Exact k-NN — compare the query to every corpus vector. Recall = 100% by construction; cost scales O(N × dim). Viable up to a few million vectors on SSD, or on GPUs.
  • Approximate NN (ANN) — pre-build an index structure (HNSW graph, IVF clusters, disk-based variants) that trades a small amount of recall for orders-of-magnitude lower query cost.
  • HNSW (hierarchical navigable small world): graph-based, high recall at low latency, RAM-heavy.
  • IVF (inverted file / coarse-quantizer + fine-scan): cluster centroids + bucket scan, cheaper memory.
  • Disk-based / quantized (DiskANN, product quantization): trade accuracy for dramatic cost/memory cuts, enables billion-scale on storage.

The S3 Vectors launch post does not disclose which structure is used internally; the "subsecond query performance ... at massive scale" and "a few hundred to billions of records" framing implies a disk-friendly ANN approach. AWS says: "S3 Vectors automatically optimizes the vector data to achieve the best possible price-performance for vector storage."

Filter-with-ANN: pre-filter vs post-filter

When a metadata filter is combined with ANN search, two common strategies exist:

  • Pre-filter: restrict the candidate set to rows matching the filter, then do NN within that set. High recall, bad latency if the filtered subset is tiny.
  • Post-filter: do ANN ignoring the filter, then drop non-matching hits. Fast, can return fewer than K.

The launch post doesn't disclose which S3 Vectors uses; the example (filter={"genre":"scifi"} with topK=3) simply shows a filter + K.

Cold vs hot query profiles

Different workloads want different query profiles:

  • Hot, real-time (product recommendations, fraud detection): high QPS, low latency → DRAM/SSD cluster (e.g. OpenSearch Serverless k-NN).
  • Cold, archival (semantic search over historical media, agent memory, RAG over rarely-touched corpora): storage cost dominates → S3-tier (e.g. S3 Vectors).

AWS's launch positioning explicitly separates these — and provides an export path from S3 Vectors → OpenSearch to move hot subsets into the low-latency tier. See patterns/cold-to-hot-vector-tiering and concepts/hybrid-vector-tiering.

Seen in

Last updated · 200 distilled / 1,178 read