PATTERN Cited by 1 source
Pipe-syntax query language¶
Pipe-syntax query language is a query-language surface shape where
every query is composed as an ordered sequence of transformations
over a source table, each transformation introduced by a pipe
operator (e.g. |> in GoogleSQL, | in shell / KQL / PRQL, . in
dataframe APIs). Every stage is table-in / table-out, and stages
execute in the order they are written — the surface order matches the
evaluation order, which matches how an engineer reasons about the
transformation.
This is the shape the 2024-08-24 GoogleSQL
Pipe Syntax in SQL
paper adds to standard SQL as an additive extension. It is not new
semantics — each pipe stage is a named existing operator (WHERE,
SELECT, AGGREGATE, ORDER BY, JOIN, EXTEND, ...) — it is a
reordering + explicit sequencing of operators the engine already
knows how to run.
The shape¶
FROM orders
|> WHERE status = 'paid'
|> JOIN customers USING (customer_id)
|> AGGREGATE sum(amount) AS revenue
GROUP BY customer_id
|> ORDER BY revenue DESC
|> SELECT customer_id, revenue
Compare classic SQL's clause-fixed form, which reverses surface and evaluation order for the same query:
SELECT customer_id, SUM(amount) AS revenue
FROM orders
JOIN customers USING (customer_id)
WHERE status = 'paid'
GROUP BY customer_id
ORDER BY revenue DESC
Classic SQL's SELECT appears first but executes last; FROM and
WHERE appear mid-query but execute first; GROUP BY comes after
WHERE despite preceding it conceptually. Pipe syntax removes this
inversion.
Why it's more learnable / readable / composable¶
- Surface order = evaluation order. A reader scans top-to-bottom and sees the pipeline the engine runs. No mental reorder from the clause grammar to the query plan.
- Locally composable. Adding, removing, commenting, or reordering a stage is a localized edit — no interaction with clause-position rules. This is why shell pipelines, pandas chains, Spark DataFrames, Polars, KQL, PRQL, Malloy, and LINQ all converged on this shape for exploratory / ad-hoc querying.
- Typed stage boundaries. Each
|>stage has a well-defined input schema (the prior stage's output) and output schema — so IDEs and linters can provide column completions per stage, and errors can localize to the offending stage rather than to the whole query. - Stage-level factoring. Saving / reusing a prefix of the
pipeline as a CTE, view, or named intermediate is syntactically
trivial: split at the
|>boundary.
Why adding it to SQL (not replacing SQL) matters¶
This is the crucial architectural point of the GoogleSQL paper and why pipe syntax succeeded there where every from-scratch SQL replacement (Malloy, PRQL, SQL++, EdgeQL, ...) has struggled. The extension:
- sits alongside classic SQL, not instead of it;
- every existing query, tool, driver, ORM, BI dashboard keeps parsing;
- adoption is query-by-query — no migration project;
- the planner / optimizer / engine is unchanged — pipe syntax desugars to the same logical plan the engine already builds.
This property is the subject of concepts/language-extension-vs-replacement. The GoogleSQL pipe paper is the canonical wiki production instance of extending an entrenched language instead of trying to replace it.
Precedents and neighbors¶
- Unix shell pipelines (
grep | awk | sort | uniq -c) — the original production data-flow syntax; the single most-used querying idiom outside SQL. - Dataframe APIs — pandas (
df.filter(...).groupby(...).agg(...)), Spark DataFrames, Polars, R's dplyr / tidyverse. Every exploratory data-analysis ecosystem converged on method-chaining pipelines. - KQL (Kusto Query Language)
— Microsoft's log/telemetry query language;
|pipes throughout. Heavily used in Azure observability tooling. - PRQL — "Pipelined Relational Query Language"; a from-scratch
SQL replacement built around
|. Used by some DuckDB / BigQuery frontends but not the entrenched language. - Malloy — Google/Looker's SQL alternative for BI; semantic-layer
- pipe-style composition. Same trade-off: better shape, smaller ecosystem.
- LINQ — C#'s query-over-collections DSL; method-chaining shape that compiles to SQL (or IEnumerable / IObservable).
GoogleSQL's contribution is not the shape — it's the shape grafted onto entrenched SQL with zero ecosystem migration.
Tradeoffs¶
- + Readability / learnability. Surface = evaluation = reasoning order.
- + Incremental adoption. In an extension-style deployment (GoogleSQL's path), users adopt query-by-query.
- + Tool-friendly. Stage-level types + completions + linting.
- − Dual surface in the same language. SQL code now comes in two shapes, and teams have to decide when to use each. Long-term stylistic drift is a real risk (some engineers write only classic SQL, others only pipe).
- − Doesn't fix semantic issues. Pipe syntax is a parser + desugaring change; NULL semantics, three-valued logic, aggregate edge cases, join-cardinality surprises all remain. Extension reach has a boundary (see concepts/language-extension-vs-replacement).
- − Standardization lag. A single-vendor extension propagates only inside that vendor's dialect family unless the wider SQL standard picks it up.
Structural preconditions (for retrofitting onto an existing¶
language)
- Parser tolerance to an additive operator. The host language's
grammar has to accept
|>(or equivalent) without ambiguity against existing productions. - Engine semantics already cover the operator. Each pipe stage has to map cleanly to an operator the planner already knows — otherwise the extension isn't really additive.
- One authoritative dialect spec. So a single grammar change ships across the engines that consume the spec. GoogleSQL's shared-dialect-across-BigQuery/F1/Spanner property is the lever that lets the pipe-syntax change propagate at all.
Seen in¶
- sources/2024-08-24-google-pipe-syntax-in-sql — Google Research
adds
|>to GoogleSQL; the pipe-stage list is the named SQL operator set (WHERE,SELECT,AGGREGATE,ORDER BY,JOIN,EXTEND, ...). Positioned explicitly as an extension, not a replacement. Canonical wiki production instance of the pattern and of concepts/language-extension-vs-replacement.
Related¶
- systems/googlesql — the dialect that receives pipe syntax as a native extension.
- concepts/language-extension-vs-replacement — why extending SQL beats replacing it.
- patterns/query-language-as-agent-tool — adjacent pattern for exposing query languages to LLM agents; pipe-style syntax is easier to generate incrementally and easier to reason about stage- by-stage.
- patterns/intent-preserving-query-translation — when porting queries between languages / engines, translate by intent stage- by-stage — easier against a pipe-shaped surface than against a fixed-clause-order one.
- concepts/query-shape — the schema-level view of queries as parameterized shapes; orthogonal to pipe syntax but complementary in a reactive-cache / invalidation context.