PATTERN Cited by 1 source

Dual-tier observability (TSDB + lakehouse)¶

At hyperscale, neither a TSDB alone nor a lakehouse alone is adequate for an observability platform:

TSDB alone — scales on object-storage for cold data (see concepts/tiered-storage-hot-warm-cold), but its in-memory
on-disk tiers still scale with active cardinality, forcing either (a) a heavy aggregation tier that drops debugging dimensions (see patterns/aggregation-shield-for-tsdb-cardinality) or (b) an unbounded cost curve.
Lakehouse alone — can store arbitrary-cardinality raw data cheaply in object storage, but query latency is minutes-range, not real-time. Alerting rules, live dashboards, and interactive PromQL-style queries need the TSDB's in-memory index to meet SLOs.

The solution is to run both tiers simultaneously, each tuned to its strengths, and unify them at the user-facing metric-semantics layer.

Shape¶

                    Aggregation shield (drops high-card labels)
                             │
  Applications  ─ emit ──────┤
   once                      │
                             ├──▶  TSDB (aggregated, real-time)
                             │     ~seconds freshness
                             │     alerting + live dashboards
                             │
                             └──▶  Lakehouse (raw, cheap, high-card)
                                   ~minutes freshness
                                   deep troubleshooting + analytics

Both tiers receive data from the same single emission interface (see concepts/unified-metric-semantics); users don't know which tier serves a given query, because the query router chooses based on query shape (cardinality, time range, aggregation level).

Division of responsibilities¶

Property	TSDB tier (Pantheon)	Lakehouse tier (Hydra)
Freshness	Real-time (sub-second)	~5 minutes end-to-end
Cardinality ceiling	Bounded by aggregation rules	Unbounded — columnar scan
Query latency	PromQL ms	SQL seconds–minutes
Storage cost	Higher (in-memory heavy)	~50× cheaper (columnar + object)
Primary workloads	Alerting + live dashboards	Incident triage + analytics
Join with other data	Not natively	Natively (Unity Catalog)

Why unified metric semantics is load-bearing¶

The architectural discipline that makes the dual-tier split work: users don't think about which tier serves their query. They write metric_name{labels...} in PromQL or SQL, and the platform routes. Without this discipline, the dual-tier split becomes a user-facing tax (which tool do I use? which tier has my data? which query language?) that would block adoption. See concepts/unified-metric-semantics.

Translation layers¶

The user-facing query surface for the lakehouse tier is not SQL-only — a PromQL-to-SQL translation layer lets Grafana dashboards and alert rules run unmodified against the lakehouse. This means the dual-tier split is completely invisible to the user's dashboard layer. See patterns/promql-to-sql-over-delta-tables.

CUJ-first design¶

The dual-tier split is motivated by Critical User Journeys: the user-facing CUJs (live dashboard, alert, incident triage, analytics join) are each served optimally by a different tier, so the architecture splits to match rather than compromising on a single tier.

Seen in¶

sources/2026-05-05-databricks-10-trillion-samples-a-day-scaling-beyond-traditional-monitoring — canonical instance. Pantheon (TSDB, aggregated, real-time, 5B active timeseries) + Hydra (lakehouse, raw, 20B active timeseries, ~5 min freshness, ~50× cheaper) unified at emission + query interface. "A key design principle of Hydra is that engineers should not need to understand our ingestion architecture."

systems/pantheon — the TSDB tier
systems/hydra — the lakehouse tier
systems/thanos — Pantheon's upstream
systems/delta-lake — Hydra's storage
systems/grafana — unified query UI
concepts/metric-cardinality
concepts/lakehouse-native-observability
concepts/unified-metric-semantics
concepts/promql-to-sql-translation
concepts/critical-user-journey
patterns/aggregation-shield-for-tsdb-cardinality
patterns/promql-to-sql-over-delta-tables