PATTERN Cited by 1 source

Pipe-syntax query language¶

Pipe-syntax query language is a query-language surface shape where every query is composed as an ordered sequence of transformations over a source table, each transformation introduced by a pipe operator (e.g. |> in GoogleSQL, | in shell / KQL / PRQL, . in dataframe APIs). Every stage is table-in / table-out, and stages execute in the order they are written — the surface order matches the evaluation order, which matches how an engineer reasons about the transformation.

This is the shape the 2024-08-24 GoogleSQL Pipe Syntax in SQL paper adds to standard SQL as an additive extension. It is not new semantics — each pipe stage is a named existing operator (WHERE, SELECT, AGGREGATE, ORDER BY, JOIN, EXTEND, ...) — it is a reordering + explicit sequencing of operators the engine already knows how to run.

The shape¶

FROM orders
|> WHERE status = 'paid'
|> JOIN customers USING (customer_id)
|> AGGREGATE sum(amount) AS revenue
   GROUP BY customer_id
|> ORDER BY revenue DESC
|> SELECT customer_id, revenue

Compare classic SQL's clause-fixed form, which reverses surface and evaluation order for the same query:

SELECT customer_id, SUM(amount) AS revenue
  FROM orders
  JOIN customers USING (customer_id)
 WHERE status = 'paid'
 GROUP BY customer_id
 ORDER BY revenue DESC

Classic SQL's SELECT appears first but executes last; FROM and WHERE appear mid-query but execute first; GROUP BY comes after WHERE despite preceding it conceptually. Pipe syntax removes this inversion.

Why it's more learnable / readable / composable¶

Surface order = evaluation order. A reader scans top-to-bottom and sees the pipeline the engine runs. No mental reorder from the clause grammar to the query plan.
Locally composable. Adding, removing, commenting, or reordering a stage is a localized edit — no interaction with clause-position rules. This is why shell pipelines, pandas chains, Spark DataFrames, Polars, KQL, PRQL, Malloy, and LINQ all converged on this shape for exploratory / ad-hoc querying.
Typed stage boundaries. Each |> stage has a well-defined input schema (the prior stage's output) and output schema — so IDEs and linters can provide column completions per stage, and errors can localize to the offending stage rather than to the whole query.
Stage-level factoring. Saving / reusing a prefix of the pipeline as a CTE, view, or named intermediate is syntactically trivial: split at the |> boundary.

Why adding it to SQL (not replacing SQL) matters¶

This is the crucial architectural point of the GoogleSQL paper and why pipe syntax succeeded there where every from-scratch SQL replacement (Malloy, PRQL, SQL++, EdgeQL, ...) has struggled. The extension:

sits alongside classic SQL, not instead of it;
every existing query, tool, driver, ORM, BI dashboard keeps parsing;
adoption is query-by-query — no migration project;
the planner / optimizer / engine is unchanged — pipe syntax desugars to the same logical plan the engine already builds.

This property is the subject of concepts/language-extension-vs-replacement. The GoogleSQL pipe paper is the canonical wiki production instance of extending an entrenched language instead of trying to replace it.

Precedents and neighbors¶

Unix shell pipelines (grep | awk | sort | uniq -c) — the original production data-flow syntax; the single most-used querying idiom outside SQL.
Dataframe APIs — pandas (df.filter(...).groupby(...).agg(...)), Spark DataFrames, Polars, R's dplyr / tidyverse. Every exploratory data-analysis ecosystem converged on method-chaining pipelines.
KQL (Kusto Query Language) — Microsoft's log/telemetry query language; | pipes throughout. Heavily used in Azure observability tooling.
PRQL — "Pipelined Relational Query Language"; a from-scratch SQL replacement built around |. Used by some DuckDB / BigQuery frontends but not the entrenched language.
Malloy — Google/Looker's SQL alternative for BI; semantic-layer
pipe-style composition. Same trade-off: better shape, smaller ecosystem.
LINQ — C#'s query-over-collections DSL; method-chaining shape that compiles to SQL (or IEnumerable / IObservable).

GoogleSQL's contribution is not the shape — it's the shape grafted onto entrenched SQL with zero ecosystem migration.

Tradeoffs¶

+ Readability / learnability. Surface = evaluation = reasoning order.
+ Incremental adoption. In an extension-style deployment (GoogleSQL's path), users adopt query-by-query.
+ Tool-friendly. Stage-level types + completions + linting.
− Dual surface in the same language. SQL code now comes in two shapes, and teams have to decide when to use each. Long-term stylistic drift is a real risk (some engineers write only classic SQL, others only pipe).
− Doesn't fix semantic issues. Pipe syntax is a parser + desugaring change; NULL semantics, three-valued logic, aggregate edge cases, join-cardinality surprises all remain. Extension reach has a boundary (see concepts/language-extension-vs-replacement).
− Standardization lag. A single-vendor extension propagates only inside that vendor's dialect family unless the wider SQL standard picks it up.

Structural preconditions (for retrofitting onto an existing¶

language)

Parser tolerance to an additive operator. The host language's grammar has to accept |> (or equivalent) without ambiguity against existing productions.
Engine semantics already cover the operator. Each pipe stage has to map cleanly to an operator the planner already knows — otherwise the extension isn't really additive.
One authoritative dialect spec. So a single grammar change ships across the engines that consume the spec. GoogleSQL's shared-dialect-across-BigQuery/F1/Spanner property is the lever that lets the pipe-syntax change propagate at all.

Seen in¶

sources/2024-08-24-google-pipe-syntax-in-sql — Google Research adds |> to GoogleSQL; the pipe-stage list is the named SQL operator set (WHERE, SELECT, AGGREGATE, ORDER BY, JOIN, EXTEND, ...). Positioned explicitly as an extension, not a replacement. Canonical wiki production instance of the pattern and of concepts/language-extension-vs-replacement.

systems/googlesql — the dialect that receives pipe syntax as a native extension.
concepts/language-extension-vs-replacement — why extending SQL beats replacing it.
patterns/query-language-as-agent-tool — adjacent pattern for exposing query languages to LLM agents; pipe-style syntax is easier to generate incrementally and easier to reason about stage- by-stage.
patterns/intent-preserving-query-translation — when porting queries between languages / engines, translate by intent stage- by-stage — easier against a pipe-shaped surface than against a fixed-clause-order one.
concepts/query-shape — the schema-level view of queries as parameterized shapes; orthogonal to pipe syntax but complementary in a reactive-cache / invalidation context.