SYSTEM Cited by 1 source
Pinterest Vector Database as a Service¶
Pinterest's Vector Database as a Service is an internal platform that lets any team stand up a production vector index with a simple JSON config, so LLM features across the company (table search, Analytics Agent query search, AI Table Documentation) stop reinventing custom indexes, ad-hoc ingestion jobs, and brittle retrieval logic.
(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)
Architecture¶
Three off-the-shelf pieces plus a platform glue layer:
- AWS OpenSearch — the vector-plus-metadata search substrate. Chosen after evaluation for Pinterest's internal- productivity use cases.
- Hive tables — the source of truth for vectorized datasets (the embeddings + their metadata live as Hive rows).
- Airflow — runs the index-creation and ingestion DAGs, including daily incremental updates.
Contract¶
Teams define a vector index via a simple JSON schema specifying:
- the index alias,
- the vector field dimensionality (e.g. 1536 for 1536-dim embeddings),
- source Hive table mappings.
An Airflow workflow then:
- validates the config,
- creates the OpenSearch index,
- publishes metadata so other teams can discover and reuse the knowledge base.
Result: "zero to a production-grade vector index in days instead of weeks" — without teams having to solve embedding, ingestion, and monitoring from scratch.
Scale¶
- Serves millions of embeddings across tables, queries, column descriptions, and documentation.
- Daily incremental updates as new data assets and queries are created.
- Supports hybrid patterns combining semantic similarity (vector
distance) with traditional metadata filters — e.g. "tables
semantically similar to
user_actionsthat are Tier 1 and contain impression data." (concepts/hybrid-retrieval-bm25-vectors)
Internal customers¶
- Analytics Agent — table index + query-description index (the unified context-intent embedding index).
- AI Table Documentation — indexes glossary terms + column descriptions for search-based propagation to undocumented columns.
- Other unnamed "table search, Text-to-SQL, AI documentation" teams at Pinterest.
Canonical wiki pattern¶
This is the canonical wiki instance of patterns/internal-vector-db-as-service — a platform-engineering move that replaces N one-off vector stacks with one shared substrate.
Seen in¶
- sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — canonical wiki introduction.