CONCEPT Cited by 1 source
SQL-to-text transformation¶
SQL-to-text transformation is the LLM step that converts a raw SQL query into a structured natural-language description of the business question the query was designed to answer, so it can be embedded for semantic retrieval.
It is the core of Pinterest's three-step encoding pipeline for the Analytics Agent.
(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)
Three-output structure¶
Pinterest's LLM does not produce a single one-line summary. It produces three complementary outputs per query:
- High-level summary — captures business purpose + domain.
- Analytical questions — an explicit list of questions this query could help answer. This is the load-bearing output: it creates a direct bridge between future user questions and indexed queries.
- Detailed breakdown — column definitions, transformation logic, filters applied, business purpose in plain English.
Worked example¶
Input SQL (ads performance):
SELECT keyword,
SUM(impressions) AS total_impressions,
SUM(revenue) / NULLIF(SUM(IF(is_first_conversion, clicks, 0)), 0) AS cpc,
(SUM(revenue) / NULLIF(SUM(IF(is_first_conversion, impressions, 0)), 0)) * 1000 AS cpm
FROM ads.keyword_performance
WHERE dt BETWEEN '2024-10-01' AND '2024-10-31'
AND advertiser_id = 12345
AND keyword IS NOT NULL
GROUP BY keyword
ORDER BY total_impressions DESC
Outputs:
- Summary: "Extracts ad performance metrics — total impressions, CPC, and CPM by keyword for a specific advertiser. CPC and CPM are calculated based on first-conversion events, focusing on ad effectiveness in acquiring new customers."
- Analytical questions:
- What are the top-performing keywords by impressions for a given advertiser?
- How cost-effective are ad campaigns based on CPC and CPM for different keywords?
- Detailed breakdown: column definitions, transformation logic (CPC derived from first-conversion revenue / first-conversion clicks), filters applied, business purpose.
Two design choices that make it work at scale¶
- Analytical questions as the question-to-question bridge. Future user questions match against the "analytical questions" slot of past queries directly, not against summary text or table descriptions.
- Deliberately generalizable descriptions. The LLM strips temporal specifics (exact dates, individual IDs) while preserving business- meaningful values (metric types, entity categories). "A query originally written for 'October 2024 keyword performance' generalizes to match future questions about 'ad CPC by keyword' regardless of date range."
Domain-context injection is a prerequisite¶
Before SQL-to-text runs, domain context must be injected alongside the raw SQL: table + column descriptions, glossary terms, metric definitions, domain expertise. Without this, the LLM sees only raw table and column names and misses the business meaning.