CONCEPT Cited by 1 source
Embedding Collection¶
Definition¶
An embedding collection is the organizational unit of a vector database: a named, schema-pinned bucket of vector embeddings that share
- dimensionality (fixed by the producing embedding model),
- distance metric (cosine / Euclidean / inner product — also determined by the producing model),
- index structure (HNSW / IVF / DiskANN / flat — determined at collection-creation time),
- and, in platform-grade deployments, metadata pinning the collection to a specific producing model / model version, consuming service, and schema.
(Source: sources/2026-01-06-expedia-powering-vector-embedding-capabilities)
Why the collection is a first-class unit¶
Vectors on their own are opaque floats. The collection is the boundary at which "these vectors are comparable to each other and meaningful to this consumer" becomes enforceable:
- Comparability. Similarity search only makes sense between vectors produced by the same model version under the same distance metric. The collection pins both, so a query vector and the corpus it's queried against always agree.
- Governance and discovery. When different teams want to know whether a corpus-of-interest already has embeddings, the collection is the searchable unit — keyed by (service, model, version, schema).
- Evolution without breakage. Changing embedding model, dimensionality, distance metric, or index is a new collection — not an in-place upgrade. Two versions coexist; consumers cut over when ready.
- Lifecycle primitive. Creation, backfill, re-index, deletion, and retention are all collection-scoped operations.
Platform realization: Expedia¶
Expedia's Embedding Store Service uses systems/feast to register each collection with structured metadata: associated service (system that generates / consumes the embeddings) + embedding model that produced them. Three named payoffs:
- Data consistency — "the collection definition guarantees that all embeddings in a collection are linked to consistent metadata, such as the model and service they are associated with."
- Search and discoverability — "users can easily locate collections based on components of its metadata, such as a specific model or version."
- Version management — "multiple versions of the same dataset, tailored to different needs and scenarios, can be created based on various factors such as different embedding models or model versions, modifying the indexing algorithms to suit various use cases or modifying the schema."
(Source: sources/2026-01-06-expedia-powering-vector-embedding-capabilities)
Typical collection metadata¶
Platform-grade collections carry:
| Field | Role |
|---|---|
name |
Human-addressable handle |
model + model_version |
Pins comparability and distance metric |
dimensionality |
Validated on every write |
distance_metric |
Cosine / Euclidean / dot product |
index_type |
HNSW / IVF / DiskANN / flat — trade off recall vs cost |
schema |
Additional columns the vectors carry for hybrid search filters |
associated_service |
Consumer / producer for discovery + ownership |
created_at, version |
Lineage for restore / rollback |
Why this is a concept and not just a data-model detail¶
The collection structurally determines:
- What vectors are comparable — preventing cross-model queries that would silently give bad recall.
- What hybrid filters are possible — the schema defines the attributes a hybrid search can filter on.
- How migrations are staged — new collection + backfill from the offline store + dual serving during rollout + cutover.
- Who owns what — the
associated_servicefield is the ownership anchor in a multi-team ML platform.
Related¶
- concepts/vector-embedding — the primitive stored in a collection.
- concepts/vector-similarity-search — the retrieval primitive the collection indexes for.
- concepts/hybrid-search — requires the collection's schema.
- concepts/feature-store — adjacent organizational unit (feature view) in feature-store systems; the embedding collection is its analogue for vectors.
- systems/expedia-embedding-store — canonical platform realization.
- systems/feast — the metadata substrate Expedia uses.
Seen in¶
- sources/2026-01-06-expedia-powering-vector-embedding-capabilities — introduces the collection as the Expedia Embedding Store's unit of organization + versioning + discoverability, backed by Feast metadata.