Skip to content

CONCEPT Cited by 1 source

SQL-to-text transformation

SQL-to-text transformation is the LLM step that converts a raw SQL query into a structured natural-language description of the business question the query was designed to answer, so it can be embedded for semantic retrieval.

It is the core of Pinterest's three-step encoding pipeline for the Analytics Agent.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

Three-output structure

Pinterest's LLM does not produce a single one-line summary. It produces three complementary outputs per query:

  1. High-level summary — captures business purpose + domain.
  2. Analytical questions — an explicit list of questions this query could help answer. This is the load-bearing output: it creates a direct bridge between future user questions and indexed queries.
  3. Detailed breakdown — column definitions, transformation logic, filters applied, business purpose in plain English.

Worked example

Input SQL (ads performance):

SELECT keyword,
       SUM(impressions) AS total_impressions,
       SUM(revenue) / NULLIF(SUM(IF(is_first_conversion, clicks, 0)), 0) AS cpc,
       (SUM(revenue) / NULLIF(SUM(IF(is_first_conversion, impressions, 0)), 0)) * 1000 AS cpm
FROM ads.keyword_performance
WHERE dt BETWEEN '2024-10-01' AND '2024-10-31'
  AND advertiser_id = 12345
  AND keyword IS NOT NULL
GROUP BY keyword
ORDER BY total_impressions DESC

Outputs:

  • Summary: "Extracts ad performance metrics — total impressions, CPC, and CPM by keyword for a specific advertiser. CPC and CPM are calculated based on first-conversion events, focusing on ad effectiveness in acquiring new customers."
  • Analytical questions:
  • What are the top-performing keywords by impressions for a given advertiser?
  • How cost-effective are ad campaigns based on CPC and CPM for different keywords?
  • Detailed breakdown: column definitions, transformation logic (CPC derived from first-conversion revenue / first-conversion clicks), filters applied, business purpose.

Two design choices that make it work at scale

  1. Analytical questions as the question-to-question bridge. Future user questions match against the "analytical questions" slot of past queries directly, not against summary text or table descriptions.
  2. Deliberately generalizable descriptions. The LLM strips temporal specifics (exact dates, individual IDs) while preserving business- meaningful values (metric types, entity categories). "A query originally written for 'October 2024 keyword performance' generalizes to match future questions about 'ad CPC by keyword' regardless of date range."

Domain-context injection is a prerequisite

Before SQL-to-text runs, domain context must be injected alongside the raw SQL: table + column descriptions, glossary terms, metric definitions, domain expertise. Without this, the LLM sees only raw table and column names and misses the business meaning.

Seen in

Last updated · 319 distilled / 1,201 read