CONCEPT Cited by 3 sources
Vector Similarity Search¶
Vector similarity search is the retrieval primitive behind semantic search, recommendation, and RAG: given a query vector and a corpus of vectors (usually embeddings of documents / images / audio), return the top-K corpus vectors closest to the query under a chosen distance metric.
(Source: sources/2025-07-16-aws-amazon-s3-vectors-preview-launch)
Channy Yun's framing (AWS, 2025)¶
"Vector search is an emerging technique used in generative AI applications to find similar data points to given data by comparing their vector representations using distance or similarity metrics."
Distance metrics (as supported at S3 Vectors launch)¶
| Metric | Formula | Typical use |
|---|---|---|
| Cosine | 1 - (a·b) / (‖a‖ ‖b‖) |
Most NLP embeddings (BERT / Titan / OpenAI) — direction matters, magnitude doesn't |
| Euclidean (L2) | ‖a - b‖₂ |
Embeddings trained with L2 loss, some image models |
S3 Vectors exposes Cosine or Euclidean per index, chosen at
CreateVectorIndex time. Other common metrics (inner product / dot
product, Manhattan, Hamming) are not listed at preview.
"For Distance metric, you can choose either Cosine or Euclidean. When creating vector embeddings, select your embedding model's recommended distance metric for more accurate results."
Using the wrong metric for a model produces materially worse recall.
Query shape¶
A query specifies:
- The query vector (same dimensionality as the index).
top-K— how many nearest neighbours to return.- Optional filter over vector metadata (e.g.
genre="scifi"). - Optional flags to return distances / metadata with the hits.
Worked S3 Vectors example:
s3vectors.query_vectors(
vectorBucketName="...", indexName="...",
queryVector={"float32": embedding},
topK=3, filter={"genre": "scifi"},
returnDistance=True, returnMetadata=True)
Exact vs Approximate (ANN)¶
- Exact k-NN — compare the query to every corpus vector. Recall = 100% by construction; cost scales O(N × dim). Viable up to a few million vectors on SSD, or on GPUs.
- Approximate NN (ANN) — pre-build an index structure (HNSW graph, IVF clusters, disk-based variants) that trades a small amount of recall for orders-of-magnitude lower query cost.
- HNSW (hierarchical navigable small world): graph-based, high recall at low latency, RAM-heavy.
- IVF (inverted file / coarse-quantizer + fine-scan): cluster centroids + bucket scan, cheaper memory.
- Disk-based / quantized (DiskANN, product quantization): trade accuracy for dramatic cost/memory cuts, enables billion-scale on storage.
The S3 Vectors launch post does not disclose which structure is used internally; the "subsecond query performance ... at massive scale" and "a few hundred to billions of records" framing implies a disk-friendly ANN approach. AWS says: "S3 Vectors automatically optimizes the vector data to achieve the best possible price-performance for vector storage."
Filter-with-ANN: pre-filter vs post-filter¶
When a metadata filter is combined with ANN search, two common strategies exist:
- Pre-filter: restrict the candidate set to rows matching the filter, then do NN within that set. High recall, bad latency if the filtered subset is tiny.
- Post-filter: do ANN ignoring the filter, then drop non-matching hits. Fast, can return fewer than K.
The launch post doesn't disclose which S3 Vectors uses; the example
(filter={"genre":"scifi"} with topK=3) simply shows a filter + K.
Cold vs hot query profiles¶
Different workloads want different query profiles:
- Hot, real-time (product recommendations, fraud detection): high QPS, low latency → DRAM/SSD cluster (e.g. OpenSearch Serverless k-NN).
- Cold, archival (semantic search over historical media, agent memory, RAG over rarely-touched corpora): storage cost dominates → S3-tier (e.g. S3 Vectors).
AWS's launch positioning explicitly separates these — and provides an export path from S3 Vectors → OpenSearch to move hot subsets into the low-latency tier. See patterns/cold-to-hot-vector-tiering and concepts/hybrid-vector-tiering.
Seen in¶
- sources/2025-07-16-aws-amazon-s3-vectors-preview-launch — Cosine / Euclidean per-index; top-K with metadata filter; explicit positioning against "continuous low-latency search facilities" and mapping to the cold/hot split. S3 Vectors, Bedrock Knowledge Bases, OpenSearch Serverless k-NN all touched as vector-search engines.
- sources/2026-01-06-expedia-powering-vector-embedding-capabilities —
similarity search + hybrid search
(similarity combined with attribute / metadata predicates, e.g.
price < 100,category = electronics) named as the two query surfaces of Expedia's Embedding Store. Post articulates the canonical index trade-off — "the choice of index type depends on factors such as dataset size and the balance between speed and accuracy required" — without naming the structure; pre-filter-vs-post-filter semantics not disclosed. - sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma — Production k-NN over billions of CLIP embeddings on OpenSearch k-NN; memory- bound cost structure (second-biggest cost after frame enumeration); mitigations include vector quantization and metadata-filterable k-NN for faceted search (frame name / file / project / team / org). Multimodal single-index query surface thanks to CLIP (systems/clip-embedding-model) — same index serves text-query and image-query.