SYSTEM Cited by 1 source

Pinterest Vector Database as a Service¶

Pinterest's Vector Database as a Service is an internal platform that lets any team stand up a production vector index with a simple JSON config, so LLM features across the company (table search, Analytics Agent query search, AI Table Documentation) stop reinventing custom indexes, ad-hoc ingestion jobs, and brittle retrieval logic.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

Architecture¶

Three off-the-shelf pieces plus a platform glue layer:

AWS OpenSearch — the vector-plus-metadata search substrate. Chosen after evaluation for Pinterest's internal- productivity use cases.
Hive tables — the source of truth for vectorized datasets (the embeddings + their metadata live as Hive rows).
Airflow — runs the index-creation and ingestion DAGs, including daily incremental updates.

Contract¶

Teams define a vector index via a simple JSON schema specifying:

the index alias,
the vector field dimensionality (e.g. 1536 for 1536-dim embeddings),
source Hive table mappings.

An Airflow workflow then:

validates the config,
creates the OpenSearch index,
publishes metadata so other teams can discover and reuse the knowledge base.

Result: "zero to a production-grade vector index in days instead of weeks" — without teams having to solve embedding, ingestion, and monitoring from scratch.

Scale¶

Serves millions of embeddings across tables, queries, column descriptions, and documentation.
Daily incremental updates as new data assets and queries are created.
Supports hybrid patterns combining semantic similarity (vector distance) with traditional metadata filters — e.g. "tables semantically similar to user_actions that are Tier 1 and contain impression data." (concepts/hybrid-retrieval-bm25-vectors)

Internal customers¶

Analytics Agent — table index + query-description index (the unified context-intent embedding index).
AI Table Documentation — indexes glossary terms + column descriptions for search-based propagation to undocumented columns.
Other unnamed "table search, Text-to-SQL, AI documentation" teams at Pinterest.

Canonical wiki pattern¶

This is the canonical wiki instance of patterns/internal-vector-db-as-service — a platform-engineering move that replaces N one-off vector stacks with one shared substrate.

Seen in¶

sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — canonical wiki introduction.

systems/pinterest-analytics-agent — primary customer.
systems/opensearch
systems/apache-airflow
concepts/vector-embedding
patterns/internal-vector-db-as-service