CONCEPT Cited by 5 sources
Vector Embedding¶
A vector embedding is a dense numerical representation of a piece
of unstructured data — text, image, audio, video, or document — produced
by an embedding model, such that semantically similar inputs map
to vectors that are close under a chosen distance metric
(cosine, Euclidean, dot-product). The embedding is a fixed-length
array of floats (commonly float32), with length (dimensionality)
determined by the model (e.g. 1024 for Amazon Titan Text Embeddings V2
at its default config).
(Source: sources/2025-07-16-aws-amazon-s3-vectors-preview-launch)
Canonical framing (Channy Yun, AWS, 2025)¶
"Vectors are numerical representation of unstructured data created from embedding models. You use embedding models to generate vector embeddings of your data and store them in S3 Vectors to perform semantic searches."
"Vector search is an emerging technique used in generative AI applications to find similar data points to given data by comparing their vector representations using distance or similarity metrics."
What the embedding enables¶
- Semantic search — find documents about the same concept, not just those containing the same keywords.
- Retrieval-Augmented Generation (RAG) — given a user query, embed it, find nearest-neighbour embeddings in a corpus, retrieve those documents, feed them as context to an LLM.
- Recommendation / similarity — "more like this" over items without a hand-curated similarity function.
- Clustering / deduplication — group near-duplicate content.
Dimensionality¶
Each model produces vectors at a fixed dimensionality. All vectors stored in a single S3 Vectors index must share dimensionality — this is the model's output shape, pinned at index creation time.
Common dimensionalities: 384 (MiniLM), 768 (BERT-base, Titan
V1), 1024–1536 (OpenAI text-embedding-3-small, Titan V2,
Cohere), 3072 (OpenAI text-embedding-3-large).
Byte cost¶
The launch post flags the structural cost issue: for text-heavy
corpora like code or PDFs, "the vectors themselves were often more
bytes than the data being indexed" — a 4 KB document can produce a
1024-dim float32 embedding = 4 KB. This is the motivator for
storage-tier pricing: vectors demand cheap bulk storage as much as or
more than their source data.
(Source: sources/2026-04-07-allthingsdistributed-s3-files-and-the-changing-face-of-s3)
Pairing with distance metrics¶
Embedding models are trained against a specific distance metric (most often cosine or inner-product). Querying with the wrong metric can materially reduce recall:
"When creating vector embeddings, select your embedding model's recommended distance metric for more accurate results."
See concepts/vector-similarity-search for metric choices.
Summed-attribute embeddings in sequence modeling¶
Recommendation systems often build a per-action embedding by
summing embeddings of that action's attributes rather than
allocating a token per attribute. Airbnb's destination recommender
sums embedding(city) + embedding(region) + embedding(days-to-today)
to get a single per-action token that a transformer attention layer
then aggregates across the sequence. Summation (vs concatenation)
keeps dimensionality fixed and shares gradients across attributes,
effectively letting the model learn joint attribute geometry.
(Source: sources/2026-03-12-airbnb-destination-recommendation-transformer)
See concepts/user-action-as-token for the full sequence-modeling framing this composition serves.
Multimodal: text and images in one space¶
Some embedding models are multimodal — a single model embeds multiple input types (text + image, or text + audio) into the same vector space such that semantically matched pairs (a caption and its image) embed closely. The architectural implication is large: one vector index serves queries in either modality, no routing or translation layer needed.
OpenAI CLIP is the canonical open-source multimodal text+image model. Figma AI Search uses CLIP precisely for this property — users query by screenshot, by frame-selection-rendered-to-screenshot, or by text, and all three hit the same OpenSearch k-NN index.
"The model can take multiple forms of inputs (image and text) and output embeddings that are in the same space. This means that an embedding for the string 'cat' will be numerically similar to the embedding above, even though the first was generated with an image as input." (Figma Engineering, 2026-04-21)
Text-only vs image-only embedding models cannot be substituted for multimodal models here — each would produce a vector in its own space, and cross-modal nearest-neighbour queries would be meaningless.
Image vs JSON-text embeddings — a Figma datapoint¶
Figma initially tried embedding a textual JSON representation of the user's Figma-layer selection rather than rendering it to an image first. Image-derived embeddings produced better results and let screenshot-based queries share the same code path, so the JSON route was dropped (Source: sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma). For multimodal models like CLIP trained heavily on image inputs, the "render to an image first" preprocessing is a stronger signal than any structural textual proxy.
Seen in¶
- sources/2025-07-16-aws-amazon-s3-vectors-preview-launch —
defines embeddings, worked Titan V2 example over movie-plot texts
showing
bedrock.invoke_model(modelId="amazon.titan-embed-text-v2:0", ...)returning an embedding then stored vias3vectors.put_vectors. - sources/2026-04-07-allthingsdistributed-s3-files-and-the-changing-face-of-s3 — contributes the "vectors can be more bytes than the indexed data" cost-asymmetry framing.
- sources/2026-03-12-airbnb-destination-recommendation-transformer —
per-user-action embeddings built by summing
city + region + days-to-todayattribute embeddings, fed to a transformer for destination prediction. - sources/2026-01-06-expedia-powering-vector-embedding-capabilities — embeddings as the payload of first-class embedding collections registered in systems/feast with their producing model / version and associated service; platform-level framing that treats embeddings as infra-managed artifacts with lineage, not transient per-workload floats.
- sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma
— CLIP named as the multimodal
canonical; image-vs-JSON experiment outcome; batched CLIP
inference on SageMaker; vectors slimmed from OpenSearch
_sourceand re-fetched from DynamoDB on update (patterns/source-field-slimming-with-external-refetch). - sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference — embedding inference at serving time, not just the output. Voyage AI by MongoDB distinguishes queries vs documents as two distinct serving problems with different optimal batching regimes; query-side embedding inference is memory-bound on GPU and wins from token-count batching + padding removal on vLLM. Headline production result: 50 % GPU-inference-latency reduction with 3× fewer GPUs on voyage-3-large.