SYSTEM Cited by 1 source
Pinterest Analytics Agent¶
The Pinterest Analytics Agent is an LLM-driven system that helps Pinterest's 2,500+ analysts discover tables, find reusable queries, and generate validated SQL from natural-language questions. Two months after launch it covered 40% of the analyst population (target 50% by year-end); it is the #1 agent at Pinterest, with 10× the usage of the next most-used agent.
(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)
Four-layer architecture¶
- Agent Orchestration Layer. An LLM with Pinterest-specific prompts classifies tasks (documentation lookup, table discovery, query discovery, Text-to-SQL, execution) and decides which tools to call in which order.
- MCP Integration Layer. MCP
tools expose a unified interface to: table search (backed by
systems/pinterest-vector-db-service + systems/pinterest-pincat),
query search (over the query-description index built from the
unified context-intent embedding pipeline), knowledge search
(internal docs), and systems/presto execution with
EXPLAINvalidation. - Context Layer. The knowledge foundation: PinCat schemas + tiers, vector indexes of tables and queries, expert-curated docs + metric definitions, and usage patterns from query logs.
- Execution Layer. Presto for validated SQL with
EXPLAIN-before-EXECUTE, tightLIMITs (default 100 rows), and bounded error-recovery retries.
Core design principles¶
Asset-first¶
"The agent should surface existing, trusted assets — tables, curated queries, dashboards, metric definitions — before generating new SQL." Today the agent is asset-first for tables and queries; the roadmap extends to dashboards, datasets, metric definitions, curated query libraries, and workflow artifacts. (concepts/asset-first-agent-design)
Governance-aware ranking¶
Retrieval candidates are not ranked by semantic similarity alone. A Tier-1 table with active ownership and fresh data ranks higher than a semantically similar but deprecated or undocumented alternative. (concepts/governance-aware-ranking, patterns/governance-tier-ranking-fusion)
Validation-first SQL generation¶
Four disciplines gate every generated query:
- Schema grounding — references only tables/columns in PinCat.
- Column-profile-aware filters — filter values must match actual
data, e.g.
'WEB'not'web'(avoids the "looks right but returns nothing" failure mode). - Reuse validated patterns — join keys + filters are pulled from historical-query extraction, not reinvented.
EXPLAINbeforeEXECUTE, bounded retry on failure, defaultLIMIT 100.
Conflict-resolution hierarchy¶
When multiple context sources disagree:
- Expert-curated documentation (metric definitions, canonical guides).
- Schema metadata from PinCat (authoritative for structure).
- Query patterns (validated against schemas before use).
- General knowledge (fallback).
End-to-end query flow¶
User asks "Show me weekly retention for new users in the US over the past three months." The agent:
- Classifies the task as Text-to-SQL.
- Retrieves context in parallel: table search + ranking using the knowledge base; relevant historical queries via unified context- intent embeddings; table metadata from PinCat (tiers, owners, freshness); metric-definition / retention docs.
- Generates SQL with strict validation: schema-grounded, filter-
value-correct, pattern-reusing,
EXPLAIN-validated, retry-bounded,LIMIT-capped. - Returns results with transparency: the SQL it ran, tables + date ranges used, source references (schemas, queries, docs), confidence indicators + warnings (suspicious joins, empty results).
Adoption¶
- 40% of Pinterest's analyst population within two months of launch. Target 50% by year-end.
- #1 agent at Pinterest, 10× the usage of the next most-used agent.
- MCP tools for table + query search rank among Pinterest's most popular internal tools.
Evaluation¶
An internal benchmarking framework measures: (1) finding the correct tables for an analytical question, (2) generating correct SQL. Early results: table discovery meets expectations; SQL generation has room for improvement. Hardest cases:
- Complex analytical logic — multi-step calculations and window functions that require chaining multiple reasoning steps.
- Ambiguous business terms not yet captured in documentation.
- Cross-domain queries surfacing conflicting join patterns or metric definitions.
- Schema evolution — recently deprecated tables whose patterns still appear in the index.
Mitigations: human review, EXPLAIN validation, continuous index
updates, expanded test coverage with SME-verified answers, real-user-
interaction regressions.
Seen in¶
- sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — the launch / architecture post.
Related¶
- systems/pinterest-pincat — the governance catalog backing table search.
- systems/pinterest-vector-db-service — the vector substrate for both table and query indexes.
- systems/pinterest-querybook — where analyst queries are written (and where the Analytics Agent will embed in the future).
- patterns/sql-to-intent-encoding-pipeline — the query-history encoding pipeline.
- patterns/analytical-intent-retrieval — intent-based retrieval from user question → past queries.
- patterns/retrieve-then-rank-llm — the general retrieve-then-rank shape the Analytics Agent instantiates.