Skip to content

SYSTEM Cited by 1 source

Pinterest Vector Database as a Service

Pinterest's Vector Database as a Service is an internal platform that lets any team stand up a production vector index with a simple JSON config, so LLM features across the company (table search, Analytics Agent query search, AI Table Documentation) stop reinventing custom indexes, ad-hoc ingestion jobs, and brittle retrieval logic.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

Architecture

Three off-the-shelf pieces plus a platform glue layer:

  • AWS OpenSearch — the vector-plus-metadata search substrate. Chosen after evaluation for Pinterest's internal- productivity use cases.
  • Hive tables — the source of truth for vectorized datasets (the embeddings + their metadata live as Hive rows).
  • Airflow — runs the index-creation and ingestion DAGs, including daily incremental updates.

Contract

Teams define a vector index via a simple JSON schema specifying:

  • the index alias,
  • the vector field dimensionality (e.g. 1536 for 1536-dim embeddings),
  • source Hive table mappings.

An Airflow workflow then:

  1. validates the config,
  2. creates the OpenSearch index,
  3. publishes metadata so other teams can discover and reuse the knowledge base.

Result: "zero to a production-grade vector index in days instead of weeks" — without teams having to solve embedding, ingestion, and monitoring from scratch.

Scale

  • Serves millions of embeddings across tables, queries, column descriptions, and documentation.
  • Daily incremental updates as new data assets and queries are created.
  • Supports hybrid patterns combining semantic similarity (vector distance) with traditional metadata filters — e.g. "tables semantically similar to user_actions that are Tier 1 and contain impression data." (concepts/hybrid-retrieval-bm25-vectors)

Internal customers

Canonical wiki pattern

This is the canonical wiki instance of patterns/internal-vector-db-as-service — a platform-engineering move that replaces N one-off vector stacks with one shared substrate.

Seen in

Last updated · 319 distilled / 1,201 read