Skip to content

PATTERN Cited by 1 source

EXPLAIN-before-EXECUTE validation

EXPLAIN-before-EXECUTE validation is the safety pattern for LLM-generated SQL: before running a generated query against the real data, send it to the query engine's EXPLAIN endpoint. If EXPLAIN fails (unknown table, unknown column, type error, invalid join), feed the error back to the LLM with a bounded retry budget. Only execute once EXPLAIN succeeds — and even then, enforce a conservative LIMIT by default.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

The loop

    generate SQL ────► EXPLAIN ────► succeeds? ──yes──► EXECUTE (with LIMIT)
          ▲              │
          │              │ fails
          │              ▼
          │          error + SQL + retry budget
          │              │
          │              ▼
          └─── iterate (within bounded retries) ───

What the validation catches

Pinterest names four classes of failure the Analytics Agent guards against:

  1. Non-existent tables / columns — hallucinated identifiers.
  2. Invalid joins — join keys that don't resolve or have incompatible types.
  3. Syntax errors — malformed generated SQL.
  4. Semantic-but-not-syntactic errors — notably the "looks right but returns nothing" failure mode when filter values don't match actual data (e.g. 'web' lowercase vs 'WEB' in storage). Pinterest mitigates this specific class with column profiling data at generation time, so the LLM knows the actual value distribution before emitting filters.

Why EXPLAIN is the right gate

  • Cheap. EXPLAIN runs the planner, not the query — negligible cost vs. executing a bad query that scans terabytes.
  • Catches structural errors deterministically. Schema validation is not probabilistic — a column either exists or it doesn't.
  • Provides actionable feedback. Query-engine error messages are well-formatted and point to specific tokens; easy to feed back into the LLM.

Bounded retry

Pinterest explicitly calls out "iterates with fixes up to a bounded retry limit." Unbounded retry on an LLM that can't reason about the schema produces infinite loops or degradation into invention. The bound is a safety property, not a quality degradation.

The default LIMIT 100 companion

Even once SQL passes EXPLAIN, Pinterest enforces "a conservative LIMIT (100 rows or fewer) by default." EXPLAIN validates structure; LIMIT bounds blast radius when the query is structurally valid but semantically wrong (e.g. missing a WHERE clause on a billion- row fact table). The two together form a defense-in-depth ladder:

  1. EXPLAIN catches most structural errors.
  2. LIMIT bounds the cost of remaining semantic errors.
  3. Users see a small-row result and can say "that's wrong" cheaply.

Returns with transparency

On success, Pinterest's agent surfaces: the SQL that ran, the tables + date ranges used, source references (schemas, past queries, docs), and confidence indicators / warnings (suspicious joins, empty results). Transparency is a complement to validation, not a replacement: users still need to see what actually ran.

Seen in

Last updated · 319 distilled / 1,201 read