Skip to content

CONCEPT Cited by 1 source

Unified context-intent embedding

Unified context-intent embeddings are Pinterest's named contribution to production Text-to-SQL: a single embedding space that indexes natural-language descriptions of the business question each historical SQL query was designed to answer, rather than table names or table descriptions.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

What "unified context-intent" means

Two pieces unify:

  • Context — Pinterest-specific domain signals injected into the representation before embedding: table + column descriptions, glossary terms (g_advertiser_id / adv_id both mapping to advertiser_id), metric definitions, data-quality caveats.
  • Intent — the business question the query was designed to answer, extracted by an LLM via SQL-to-text transformation with three outputs per query (summary / analytical questions / detailed breakdown).

The combined vector sits in a shared space with user-question vectors at query time.

Why it works when table-description RAG doesn't

Traditional RAG over table descriptions fails because:

  • Question wording doesn't match table description wording.
  • Multiple tables match semantically but only one has the right join pattern for the business question.
  • Company-specific metric conventions (e.g. "engagement rate" = specific action types / impressions at Pinterest) aren't in any description.

Unified context-intent embeddings sidestep all three: the index is keyed by what past queries answered, not by what tables look like. A user asking "What's the engagement rate for organic Pins by country?" matches a historical query's description regardless of the tables that query used.

The key design trick: the SQL-to-text step produces explicit "analytical questions this query could help answer", so user questions match question-to- question rather than question-to-table-description.

Generalization: strip specifics, keep semantics

Descriptions are kept deliberately generalizable: the LLM strips temporal specifics (exact dates, individual IDs) while preserving business-meaningful values (metric types, entity categories). A query originally for "October 2024 keyword performance" generalizes to match future questions about "ad CPC by keyword" regardless of date range.

Seen in

Last updated · 319 distilled / 1,201 read