Skip to content

PATTERN Cited by 1 source

Pipe-syntax query language

Pipe-syntax query language is a query-language surface shape where every query is composed as an ordered sequence of transformations over a source table, each transformation introduced by a pipe operator (e.g. |> in GoogleSQL, | in shell / KQL / PRQL, . in dataframe APIs). Every stage is table-in / table-out, and stages execute in the order they are written — the surface order matches the evaluation order, which matches how an engineer reasons about the transformation.

This is the shape the 2024-08-24 GoogleSQL Pipe Syntax in SQL paper adds to standard SQL as an additive extension. It is not new semantics — each pipe stage is a named existing operator (WHERE, SELECT, AGGREGATE, ORDER BY, JOIN, EXTEND, ...) — it is a reordering + explicit sequencing of operators the engine already knows how to run.

The shape

FROM orders
|> WHERE status = 'paid'
|> JOIN customers USING (customer_id)
|> AGGREGATE sum(amount) AS revenue
   GROUP BY customer_id
|> ORDER BY revenue DESC
|> SELECT customer_id, revenue

Compare classic SQL's clause-fixed form, which reverses surface and evaluation order for the same query:

SELECT customer_id, SUM(amount) AS revenue
  FROM orders
  JOIN customers USING (customer_id)
 WHERE status = 'paid'
 GROUP BY customer_id
 ORDER BY revenue DESC

Classic SQL's SELECT appears first but executes last; FROM and WHERE appear mid-query but execute first; GROUP BY comes after WHERE despite preceding it conceptually. Pipe syntax removes this inversion.

Why it's more learnable / readable / composable

  • Surface order = evaluation order. A reader scans top-to-bottom and sees the pipeline the engine runs. No mental reorder from the clause grammar to the query plan.
  • Locally composable. Adding, removing, commenting, or reordering a stage is a localized edit — no interaction with clause-position rules. This is why shell pipelines, pandas chains, Spark DataFrames, Polars, KQL, PRQL, Malloy, and LINQ all converged on this shape for exploratory / ad-hoc querying.
  • Typed stage boundaries. Each |> stage has a well-defined input schema (the prior stage's output) and output schema — so IDEs and linters can provide column completions per stage, and errors can localize to the offending stage rather than to the whole query.
  • Stage-level factoring. Saving / reusing a prefix of the pipeline as a CTE, view, or named intermediate is syntactically trivial: split at the |> boundary.

Why adding it to SQL (not replacing SQL) matters

This is the crucial architectural point of the GoogleSQL paper and why pipe syntax succeeded there where every from-scratch SQL replacement (Malloy, PRQL, SQL++, EdgeQL, ...) has struggled. The extension:

  • sits alongside classic SQL, not instead of it;
  • every existing query, tool, driver, ORM, BI dashboard keeps parsing;
  • adoption is query-by-query — no migration project;
  • the planner / optimizer / engine is unchanged — pipe syntax desugars to the same logical plan the engine already builds.

This property is the subject of concepts/language-extension-vs-replacement. The GoogleSQL pipe paper is the canonical wiki production instance of extending an entrenched language instead of trying to replace it.

Precedents and neighbors

  • Unix shell pipelines (grep | awk | sort | uniq -c) — the original production data-flow syntax; the single most-used querying idiom outside SQL.
  • Dataframe APIs — pandas (df.filter(...).groupby(...).agg(...)), Spark DataFrames, Polars, R's dplyr / tidyverse. Every exploratory data-analysis ecosystem converged on method-chaining pipelines.
  • KQL (Kusto Query Language) — Microsoft's log/telemetry query language; | pipes throughout. Heavily used in Azure observability tooling.
  • PRQL — "Pipelined Relational Query Language"; a from-scratch SQL replacement built around |. Used by some DuckDB / BigQuery frontends but not the entrenched language.
  • Malloy — Google/Looker's SQL alternative for BI; semantic-layer
  • pipe-style composition. Same trade-off: better shape, smaller ecosystem.
  • LINQ — C#'s query-over-collections DSL; method-chaining shape that compiles to SQL (or IEnumerable / IObservable).

GoogleSQL's contribution is not the shape — it's the shape grafted onto entrenched SQL with zero ecosystem migration.

Tradeoffs

  • + Readability / learnability. Surface = evaluation = reasoning order.
  • + Incremental adoption. In an extension-style deployment (GoogleSQL's path), users adopt query-by-query.
  • + Tool-friendly. Stage-level types + completions + linting.
  • − Dual surface in the same language. SQL code now comes in two shapes, and teams have to decide when to use each. Long-term stylistic drift is a real risk (some engineers write only classic SQL, others only pipe).
  • − Doesn't fix semantic issues. Pipe syntax is a parser + desugaring change; NULL semantics, three-valued logic, aggregate edge cases, join-cardinality surprises all remain. Extension reach has a boundary (see concepts/language-extension-vs-replacement).
  • − Standardization lag. A single-vendor extension propagates only inside that vendor's dialect family unless the wider SQL standard picks it up.

Structural preconditions (for retrofitting onto an existing

language)

  • Parser tolerance to an additive operator. The host language's grammar has to accept |> (or equivalent) without ambiguity against existing productions.
  • Engine semantics already cover the operator. Each pipe stage has to map cleanly to an operator the planner already knows — otherwise the extension isn't really additive.
  • One authoritative dialect spec. So a single grammar change ships across the engines that consume the spec. GoogleSQL's shared-dialect-across-BigQuery/F1/Spanner property is the lever that lets the pipe-syntax change propagate at all.

Seen in

Last updated · 200 distilled / 1,178 read