PATTERN Cited by 1 source

SQL-to-intent encoding pipeline¶

SQL-to-intent encoding is Pinterest's three-step pipeline that converts raw historical SQL queries into a semantically searchable intent index.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

The three steps¶

Step 1 — Domain context injection¶

Before any LLM reads the SQL, inject Pinterest-specific semantic context alongside the raw query:

Table and column descriptions from PinCat to add business context.
Standardized glossary terms — e.g. advertiser_id maps to g_advertiser_id in one table and adv_id in another.
Metric definitions — e.g. "engaged user" means specific action types.
Domain expertise — data quality caveats, recommended date ranges, Pinterest-specific conventions.

Without this step, the LLM sees only raw table and column names and misses the business meaning behind them.

Step 2 — SQL to text¶

An LLM transforms the SQL + context bundle into a structured natural- language description, producing three complementary outputs per query (see concepts/sql-to-text-transformation):

High-level summary — captures business purpose + domain.
Analytical questions — explicit list of questions this query could help answer. The load-bearing output.
Detailed breakdown — column definitions, transformation logic, filters, business purpose in plain English.

Two design choices make this effective at scale:

The analytical questions slot creates the question-to-question bridge future user questions match against.
Descriptions are kept deliberately generalizable — strip temporal specifics (exact dates, individual IDs), preserve business-meaningful values (metric types, entity categories).

Step 3 — Text to embedding¶

Embed the natural-language description (or its per-slot components) into vectors in a unified context-intent embedding space. This enables intent-based retrieval: at query time, embed the user's natural-language question the same way and find historical queries that answered similar analytical questions, regardless of keyword overlap.

When to reach for this pattern¶

Use this pattern when:

You have a large corpus of historical domain-specific artifacts (SQL queries here, but the same shape applies to runbooks, incident tickets, investigation notes, etc.).
The natural user interaction is in the same vocabulary as the business-question artifacts, not the technical-vocabulary metadata of the underlying systems.
You want to reuse validated solutions rather than re-derive them.

Do not reach for this pattern when:

Your artifact corpus is too small to amortize the SQL-to-text generation cost.
Artifacts are already written in user-vocabulary (a simple embedding without the translation step suffices).
You lack the domain-context infrastructure (PinCat-equivalent) needed for meaningful context injection — you'll get garbage summaries.

Seen in¶

sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — canonical wiki instance: Pinterest's Analytics Agent query- description index.

patterns/analytical-intent-retrieval — the consumer of this pipeline's output.
patterns/governance-tier-ranking-fusion — paired with the retrieval step.
concepts/unified-context-intent-embedding
concepts/query-history-knowledge-base