Skip to content

CONCEPT Cited by 1 source

PromQL-to-SQL translation

A translation layer that accepts PromQL queries from existing dashboards / alerting rules / Grafana panels and executes them as SQL against a columnar-object-storage lakehouse store (Delta Lake / Iceberg tables) rather than an in-memory TSDB.

The primary motivation is interface stability: engineers keep writing PromQL, Grafana dashboards continue to work unmodified, but the underlying storage substrate shifts from a TSDB to a lakehouse — with dramatically different cost / cardinality / scaling properties.

See concepts/critical-user-journey for the design-primitive framing: when the user-facing interface is the load-bearing constraint, everything underneath it can be swapped.

What the translation layer does

  • Parse PromQL expressions (instant vectors, range vectors, aggregation operators, rate / irate / increase, histogram quantile, label matchers, subqueries).
  • Rewrite into SQL against the Delta / Iceberg table schema representing metrics as columns: timestamp, metric_name, labels (typed map or flattened columns), value.
  • Optimise for the columnar-store execution model — predicate pushdown on metric name + label equality, time- range partitioning, Z-order clustering on high-cardinality labels.
  • Execute via the lakehouse's SQL engine (Photon / Spark SQL / Trino / Databricks SQL Warehouse).
  • Return results in the PromQL response format expected by Grafana's data source API.

Where fidelity gets hard

PromQL has semantics that don't trivially round-trip to SQL:

  • Range-vector semanticsrate(x[5m]) requires same-series sample pairs within a rolling window; expressing this in SQL means window functions partitioned by label set.
  • rate() on partial data — PromQL extrapolates across gaps in a defined way; the SQL equivalent must replicate that behaviour.
  • Histogram quantile arithmetic — requires bucket-bounded interpolation that SQL doesn't express natively.
  • Staleness markers — PromQL has a notion of "stale" (a series was scraped but no sample arrived); preserving this in a lakehouse store requires explicit staleness tracking.
  • Instant-vector join semantics — PromQL's on() / ignoring() label-matching vector joins need careful translation.

A faithful translator tracks these; cost: a lot of care, and a real engineering investment.

Why it's worth it

Absent translation, moving observability data to a lakehouse would force engineers to rewrite every dashboard and alert rule in SQL — a migration so costly that it blocks the substrate change. With translation, the substrate change is invisible to users, and the cost / cardinality / joinability benefits of lakehouse storage become available.

Seen in

Last updated · 451 distilled / 1,324 read