SYSTEM Cited by 1 source

Pinterest Analytics Agent¶

The Pinterest Analytics Agent is an LLM-driven system that helps Pinterest's 2,500+ analysts discover tables, find reusable queries, and generate validated SQL from natural-language questions. Two months after launch it covered 40% of the analyst population (target 50% by year-end); it is the #1 agent at Pinterest, with 10× the usage of the next most-used agent.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

Four-layer architecture¶

Agent Orchestration Layer. An LLM with Pinterest-specific prompts classifies tasks (documentation lookup, table discovery, query discovery, Text-to-SQL, execution) and decides which tools to call in which order.
MCP Integration Layer. MCP tools expose a unified interface to: table search (backed by systems/pinterest-vector-db-service + systems/pinterest-pincat), query search (over the query-description index built from the unified context-intent embedding pipeline), knowledge search (internal docs), and systems/presto execution with EXPLAIN validation.
Context Layer. The knowledge foundation: PinCat schemas + tiers, vector indexes of tables and queries, expert-curated docs + metric definitions, and usage patterns from query logs.
Execution Layer. Presto for validated SQL with EXPLAIN-before-EXECUTE, tight LIMITs (default 100 rows), and bounded error-recovery retries.

Core design principles¶

Asset-first¶

"The agent should surface existing, trusted assets — tables, curated queries, dashboards, metric definitions — before generating new SQL." Today the agent is asset-first for tables and queries; the roadmap extends to dashboards, datasets, metric definitions, curated query libraries, and workflow artifacts. (concepts/asset-first-agent-design)

Governance-aware ranking¶

Retrieval candidates are not ranked by semantic similarity alone. A Tier-1 table with active ownership and fresh data ranks higher than a semantically similar but deprecated or undocumented alternative. (concepts/governance-aware-ranking, patterns/governance-tier-ranking-fusion)

Validation-first SQL generation¶

Four disciplines gate every generated query:

Schema grounding — references only tables/columns in PinCat.
Column-profile-aware filters — filter values must match actual data, e.g. 'WEB' not 'web' (avoids the "looks right but returns nothing" failure mode).
Reuse validated patterns — join keys + filters are pulled from historical-query extraction, not reinvented.
EXPLAIN before EXECUTE, bounded retry on failure, default LIMIT 100.

Conflict-resolution hierarchy¶

When multiple context sources disagree:

Expert-curated documentation (metric definitions, canonical guides).
Schema metadata from PinCat (authoritative for structure).
Query patterns (validated against schemas before use).
General knowledge (fallback).

End-to-end query flow¶

User asks "Show me weekly retention for new users in the US over the past three months." The agent:

Classifies the task as Text-to-SQL.
Retrieves context in parallel: table search + ranking using the knowledge base; relevant historical queries via unified context- intent embeddings; table metadata from PinCat (tiers, owners, freshness); metric-definition / retention docs.
Generates SQL with strict validation: schema-grounded, filter- value-correct, pattern-reusing, EXPLAIN-validated, retry-bounded, LIMIT-capped.
Returns results with transparency: the SQL it ran, tables + date ranges used, source references (schemas, queries, docs), confidence indicators + warnings (suspicious joins, empty results).

Adoption¶

40% of Pinterest's analyst population within two months of launch. Target 50% by year-end.
#1 agent at Pinterest, 10× the usage of the next most-used agent.
MCP tools for table + query search rank among Pinterest's most popular internal tools.

Evaluation¶

An internal benchmarking framework measures: (1) finding the correct tables for an analytical question, (2) generating correct SQL. Early results: table discovery meets expectations; SQL generation has room for improvement. Hardest cases:

Complex analytical logic — multi-step calculations and window functions that require chaining multiple reasoning steps.
Ambiguous business terms not yet captured in documentation.
Cross-domain queries surfacing conflicting join patterns or metric definitions.
Schema evolution — recently deprecated tables whose patterns still appear in the index.

Mitigations: human review, EXPLAIN validation, continuous index updates, expanded test coverage with SME-verified answers, real-user- interaction regressions.

Seen in¶

sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — the launch / architecture post.

systems/pinterest-pincat — the governance catalog backing table search.
systems/pinterest-vector-db-service — the vector substrate for both table and query indexes.
systems/pinterest-querybook — where analyst queries are written (and where the Analytics Agent will embed in the future).
patterns/sql-to-intent-encoding-pipeline — the query-history encoding pipeline.
patterns/analytical-intent-retrieval — intent-based retrieval from user question → past queries.
patterns/retrieve-then-rank-llm — the general retrieve-then-rank shape the Analytics Agent instantiates.