PATTERN Cited by 1 source
SQL-to-intent encoding pipeline¶
SQL-to-intent encoding is Pinterest's three-step pipeline that converts raw historical SQL queries into a semantically searchable intent index.
(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)
The three steps¶
Step 1 — Domain context injection¶
Before any LLM reads the SQL, inject Pinterest-specific semantic context alongside the raw query:
- Table and column descriptions from PinCat to add business context.
- Standardized glossary terms — e.g.
advertiser_idmaps tog_advertiser_idin one table andadv_idin another. - Metric definitions — e.g. "engaged user" means specific action types.
- Domain expertise — data quality caveats, recommended date ranges, Pinterest-specific conventions.
Without this step, the LLM sees only raw table and column names and misses the business meaning behind them.
Step 2 — SQL to text¶
An LLM transforms the SQL + context bundle into a structured natural- language description, producing three complementary outputs per query (see concepts/sql-to-text-transformation):
- High-level summary — captures business purpose + domain.
- Analytical questions — explicit list of questions this query could help answer. The load-bearing output.
- Detailed breakdown — column definitions, transformation logic, filters, business purpose in plain English.
Two design choices make this effective at scale:
- The analytical questions slot creates the question-to-question bridge future user questions match against.
- Descriptions are kept deliberately generalizable — strip temporal specifics (exact dates, individual IDs), preserve business-meaningful values (metric types, entity categories).
Step 3 — Text to embedding¶
Embed the natural-language description (or its per-slot components) into vectors in a unified context-intent embedding space. This enables intent-based retrieval: at query time, embed the user's natural-language question the same way and find historical queries that answered similar analytical questions, regardless of keyword overlap.
When to reach for this pattern¶
Use this pattern when:
- You have a large corpus of historical domain-specific artifacts (SQL queries here, but the same shape applies to runbooks, incident tickets, investigation notes, etc.).
- The natural user interaction is in the same vocabulary as the business-question artifacts, not the technical-vocabulary metadata of the underlying systems.
- You want to reuse validated solutions rather than re-derive them.
Do not reach for this pattern when:
- Your artifact corpus is too small to amortize the SQL-to-text generation cost.
- Artifacts are already written in user-vocabulary (a simple embedding without the translation step suffices).
- You lack the domain-context infrastructure (PinCat-equivalent) needed for meaningful context injection — you'll get garbage summaries.
Seen in¶
- sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — canonical wiki instance: Pinterest's Analytics Agent query- description index.
Related¶
- patterns/analytical-intent-retrieval — the consumer of this pipeline's output.
- patterns/governance-tier-ranking-fusion — paired with the retrieval step.
- concepts/unified-context-intent-embedding
- concepts/query-history-knowledge-base