GOOGLE 2024-08-24 Tier 1

Google Research — SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL¶

Summary¶

Google's research paper (published 2024-08-24, surfaced on Hacker News at 308 points) describes piped data-flow syntax added natively to GoogleSQL — the SQL dialect shared across BigQuery, F1, Spanner, and other internal Google query engines. Rather than propose a replacement query language (SQL's perennial would-be successor), the authors extend SQL from within: the pipe operator |> lets any query start from a FROM and apply named transformations (WHERE, AGGREGATE, SELECT, ORDER BY, JOIN, EXTEND, ...) in top-to-bottom data-flow order, matching how the query is actually executed and how engineers reason about it. Classic SQL's fixed SELECT … FROM … WHERE … GROUP BY … HAVING … ORDER BY clause order (written in the opposite order from evaluation) is preserved — pipe syntax sits alongside, not instead of, it. The core claim is ecosystem-scale: extending the entrenched language lets users, tools, drivers, ORMs, existing queries, and migration paths keep working, where every prior from-scratch replacement has failed to achieve enough mass to displace SQL.

Key takeaways¶

SQL's design problems are real and widely observed — hard to learn, hard to read, hard to compose, hard to extend — but language replacement has been the wrong solution class for decades because adoption cost dominates language merit. The paper reframes the question from "what should replace SQL?" to "what can we add inside SQL to fix the problems without breaking the ecosystem?" This is the canonical wiki instance of concepts/language-extension-vs-replacement (Source: sources/2024-08-24-google-pipe-syntax-in-sql).
Pipe operator |> = data-flow ordering applied to SQL. Inspired by shell pipelines, dataframe APIs (pandas, Spark, Polars), LINQ, and KQL/PRQL/Malloy — any query can start with FROM table and then be transformed step by step with |> WHERE …, |> AGGREGATE sum(x) GROUP BY y, |> SELECT …, |> ORDER BY …, |> JOIN …. Each stage is a table-in / table-out transformation, composing in the order written — the exact property that patterns/pipe-syntax-query-language generalizes (Source: sources/2024-08-24-google-pipe-syntax-in-sql).
Incremental adoption is the load-bearing feature, not syntax aesthetics. Because pipe syntax ships as an additive extension to GoogleSQL, every existing query keeps working, every existing tool (BI dashboards, ETL jobs, ORMs, CLIs, logging pipelines) keeps parsing, and users adopt it query by query with no migration, training campaign, or compatibility flag. This is the structural property that language-replacement candidates (SQL++, Malloy, PRQL, EdgeQL, …) cannot offer — they require users to switch stacks (Source: sources/2024-08-24-google-pipe-syntax-in-sql).
Shared dialect across Google's query engines. The paper positions pipe syntax as a feature of systems/googlesql — the SQL dialect used by BigQuery, F1, Spanner, and other Google data systems — not a feature of any single engine. Adding the extension to the dialect propagates across the whole Google query-engine fleet; the extension is a spec-level change, not a per-engine feature flag (Source: sources/2024-08-24-google-pipe-syntax-in-sql).
The |> stages are named SQL operators, not new semantics. Each pipe stage corresponds to an existing SQL concept (WHERE, SELECT, GROUP BY/AGGREGATE, ORDER BY, JOIN, UNION, EXTEND for column addition, etc.). The paper's claim is that reordering + explicit sequencing of already-known operators is what makes SQL easier to learn, read, and extend — no new evaluation model, no new type system, no new planner, just a new surface syntax feeding the same planner (Source: sources/2024-08-24-google-pipe-syntax-in-sql).
Language-replacement has been tried repeatedly and has not worked. The authors explicitly name the failure pattern: "New language adoption is a significant obstacle for users, and none of the potential replacements have been successful enough to displace SQL." The paper is a deliberate argument against further from-scratch attempts and for in-place dialect evolution (Source: sources/2024-08-24-google-pipe-syntax-in-sql).
Ecosystem gravity as design force. The paper's core meta-lesson is that an entrenched language's ecosystem (users, tools, drivers, training materials, integrations, existing code) is a larger design constraint than the language's aesthetics. Extending SQL preserves that ecosystem in full; replacing it has to rebuild each piece from scratch before any adoption can begin. This extends to other long-lived languages where "the grammar is the API" — wire protocols, configuration languages, markup languages — and formalizes in concepts/language-extension-vs-replacement (Source: sources/2024-08-24-google-pipe-syntax-in-sql).

Systems / concepts / patterns extracted¶

systems/googlesql (new) — Google's cross-engine SQL dialect used by BigQuery, F1, Spanner and other internal query systems; the dialect that receives |> as a native extension.
concepts/language-extension-vs-replacement (new) — when an entrenched language has real design problems but ecosystem gravity that keeps displacing replacement candidates, extending the language in-place beats proposing a successor. Pipe syntax in GoogleSQL is the canonical production example; TypeScript → JavaScript and C++ → C are adjacent historical precedents.
patterns/pipe-syntax-query-language (new) — the shape itself: compose queries from a source table via ordered |> transformations, each a named existing operator. Applied to SQL by GoogleSQL; earlier instances in shell pipelines, KQL / PRQL / Malloy, dataframe APIs (pandas, Spark, Polars), and LINQ.

Caveats¶

Abstract-level capture only. The raw article captures only the research.google abstract, not the full paper PDF. Concrete syntax examples, the full grammar, planner integration details, and any benchmark/user-study numbers are in the paper itself, not in the ingested text. Wiki claims here are scoped to what the abstract states explicitly.
No quantitative outcome disclosed in the abstract. No adoption numbers inside Google, no reported learner-time-to-proficiency delta, no performance numbers. The paper's argument is qualitative and ecosystem-structural.
GoogleSQL only. Whether the extension propagates to ANSI SQL or to non-Google engines (Postgres, MySQL, Snowflake, DuckDB, ...) is an open downstream question; the abstract does not address it. As of HN discussion (2024-08-24) the feature is positioned as internal to the GoogleSQL dialect.
No incident / production-retrospective content. This is a language-design paper, not an engineering-blog-style production post-mortem. Ingested narrowly on the language-design / query-engine DX axis rather than given the full distributed-systems deep-dive treatment.

Links¶

Paper page: https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/
HN discussion (308 points): https://news.ycombinator.com/item?id=41338877
Raw: raw/google/2024-08-24-pipe-syntax-in-sql-808ad7ec.md