Expedia — Powering Vector Embedding Capabilities¶
Summary¶
Expedia Group's ML Platform team describes the Embedding Store
Service — a centralized vector-embedding platform exposing
standardized APIs for creating collections, loading embeddings,
and running similarity / hybrid searches across Expedia's many ML
workloads (recommendation, semantic search, and similar). The
service's metadata layer is Feast — the
open-source feature store — repurposed
here to register embedding
collections (the organizational unit of the vector DB),
annotate them with the associated service and embedding
model / version, and make them discoverable. Storage is
explicitly split into two tiers following Feast's online/offline
duality: an online store acting as the vector DB for
interactive low-latency similarity search, and an offline store
holding the full historical dataset for analytics, experimentation,
model training, and backup, with a documented restore path from
offline → online gated on creation date, time range, or arbitrary
SQL. Three ingestion modes coexist — batch via Feast
materialization over Spark pulling from offline sources, a
real-time Insert API for small batches / streaming, and
on-the-fly generation where the service itself calls named
models to produce embeddings. Writes land simultaneously in
both online and offline stores. Search surfaces: similarity
search (top-K under a distance metric via an index) and
hybrid search (similarity + metadata / attribute filters such
as price < 100 or category = electronics). No production
numbers (QPS, p50/p99, vector counts, dimensionality, distance
metrics, specific vector DB named, offline-store engine named, or
named Expedia consumer workloads) — the post is a platform-design
overview, not a retrospective.
Key takeaways¶
-
Centralized embedding platform as a piece of ML infrastructure distinct from the feature store itself. Expedia surfaces the centralized embedding platform — vector-DB backed, metadata-governed, standardized APIs — as a shared substrate for multiple ML experiences, the same organizational win a feature store delivers but for unstructured data embeddings rather than numeric/categorical features. Stated benefits: "reduced development time and acceleration of development and iteration of different ML experiences", "standardized APIs", "discoverability and management of embeddings". (Source: this post, "Summary and moving forward".)
-
Feast is used here as a metadata-and-orchestration layer for embeddings, not only for features. Feast registers each collection with the associated service (consuming / producing system) and the model / model version that produced its vectors. This is a non-obvious extension of systems/feast — Feast's canonical pitch is feature definitions for training / serving; Expedia demonstrates the same declarative / adapter-ecosystem property works cleanly for embedding collections. The named payoffs are collection-level: data consistency (embeddings always paired with the model + service metadata that produced them), search / discoverability across ML teams, and multi-version management (different models, index settings, or schemas coexisting for the same service). (Source: body, "Leveraging the Feast feature store for metadata management and discoverability".)
-
Online/offline storage duality directly transplanted from feature-store design to embedding-store design. The online store is the vector DB (interactive similarity search, most recent data, optimized for real-time retrieval — use cases named: recommendation systems, semantic search). The offline store is the historical dataset repository (batch analytics, experimentation, model training, backup; "complete historical record of embeddings and their associated metadata"). Crucially, the two tiers are bridged: "The seamless integration between the online and offline stores allows users to restore data from the offline store to the online store whenever needed" — restore granularity is by creation date, time range, or "more complex SQL queries". (Source: body, "online store / offline store" sections.)
-
Dual-write to online + offline on every ingest, regardless of mode. "Regardless of the method chosen to load data, the service ensures that all embeddings are stored simultaneously in both the online and offline storage systems, providing robust access for various use cases." The dual-write discipline is what makes the offline store usable as a restore source for backfills / re-indexes / model switches, and what makes training consistent with serving by construction. (Source: body, "Generating and inserting embeddings from features".)
-
Three ingestion modes — batch via Feast materialization on Spark, Insert API for small / real-time, and on-the-fly model invocation. (1) Batch ingestion "for large volumes of embeddings generated through feature engineering processes" uses Feast materialization with a Spark-based process to efficiently load data from one or more offline sources"; this is the embedding-platform equivalent of Feast's established feature-materialization lane. (2) Insert API for small batches or real-time-produced embeddings. (3) On-the-fly embedding generation — "for scenarios where embedding generation needs to be offloaded, the Embedding Store Service can generate embeddings dynamically by calling specific models to generate embeddings on the fly" — i.e. the service itself is the embedding-inference entry point for simple / occasional producers, collapsing "embedding-generation service + embedding-store service" for callers who don't want to own their own inference path. See patterns/embedding-ingestion-modes. (Source: body, "Generating and inserting embeddings from features".)
-
Similarity search + hybrid search as the two query surfaces. Similarity search is the standard top-K nearest-neighbor primitive, with the post calling out that index choice trades "speed and accuracy" against dataset size (consistent with the exact-vs-ANN framing on the wiki's concepts/vector-similarity-search page but without naming the structure). Hybrid search here = vector similarity combined with attribute / metadata filters — examples given:
"price < 100","category = electronics". This is a different "hybrid" than hybrid retrieval of BM25 + dense vectors: same word, different axis (attribute filter vs lexical index). The two coexist on this wiki as distinct concepts — see concepts/hybrid-search for the filter-plus-vector shape and concepts/hybrid-retrieval-bm25-vectors for the keyword-plus-vector shape. (Source: body, "Search capabilities".) -
Metadata as a first-class discoverability and versioning primitive for embeddings. The post names three benefits that flow from attaching structured metadata (service, model, version) to each collection: (a) data consistency — a collection's definition pins the model + service it was produced against, so embeddings in a collection are never mixed across model versions; (b) search and discoverability — teams find existing collections by model / version / service rather than re-embedding the same corpus; (c) version management — multiple versions of a conceptually-same dataset can coexist (different embedding models, index algorithms, schemas). Explicitly framed as a pre-requisite for safe experimentation and evolution without losing lineage. (Source: body, "Leveraging the Feast feature store for metadata management and discoverability".)
Systems / concepts / patterns introduced or extended¶
- Introduces systems/expedia-embedding-store, concepts/embedding-collection, concepts/hybrid-search, patterns/centralized-embedding-platform, patterns/embedding-ingestion-modes, patterns/dual-write-online-offline.
- Extends systems/feast (new Seen-in: Expedia embedding-store metadata layer; non-feature use-case), concepts/feature-store (Seen-in: online/offline duality transplanted to embeddings), concepts/vector-embedding (Seen-in: Expedia collection-level metadata framing), concepts/vector-similarity-search (Seen-in: similarity + hybrid-filter query surface), companies/expedia.
Operational numbers¶
None disclosed. The post does not state:
- Vector DB engine (Milvus / Qdrant / Weaviate / Pinecone / OpenSearch k-NN / in-house — unnamed).
- Offline store engine (S3 + Parquet / Iceberg / Delta / warehouse — unnamed).
- Embedding models in use or consumer model count.
- Vector counts, dimensionality, distance metrics supported.
- QPS, p50 / p99 search latency, ingestion throughput, or cost.
- Consumer workloads / products on Expedia's side.
- Index type (HNSW / IVF / DiskANN / other).
- Pre-filter vs post-filter semantics for hybrid search.
- Team / ownership structure, or operational cadence.
Caveats and limits of the post¶
- Platform-design overview, not production retrospective. No incident narrative, no scaling trade-off discussion, no numbers. Reads as a capabilities summary — the "Summary and moving forward" framing confirms this is an introductory post rather than a lessons-learned post.
- "Hybrid search" is overloaded. The post's hybrid search (similarity + metadata filter) is distinct from the other common "hybrid" in the literature (BM25 + dense vector fusion). Both are legitimate hybrids; this wiki tracks them as separate concepts.
- Feast extension is described but not instrumented. The post states that Feast is used to manage embedding-collection metadata, but doesn't describe how the embedding-specific fields (model version, index type, distance metric, dimensionality) are modeled in Feast's schema — whether via feature-view extension, tagging, or a custom registry layer is not disclosed.
- Dual-write consistency model unspecified. "Stored simultaneously in both online and offline stores" does not say whether this is synchronous 2PC-style, asynchronous best-effort with eventual reconciliation, or a write-to-offline-first-then- materialize-to-online pipeline. The restore-from-offline path works in any of these — but the failure modes differ.
- No named consumers. Dash (Dropbox) is named in this wiki's feature-store coverage as the canonical consumer; Expedia's equivalent ("the service associated with a collection" is Feast-metadata but no actual Expedia product is mentioned).
- Tier 3 substance check. Passes the AGENTS.md Tier 3 filter — the post describes production ML-platform infrastructure (vector DB + feature-store metadata + online/offline storage + batch-streaming ingestion + similarity / hybrid search). Not pure ML research; the content is serving-infra design.
Raw file¶
raw/expedia/2026-01-06-powering-vector-embedding-capabilities-622a01e5.md