Skip to content

SYSTEM Cited by 1 source

Lightstep

Lightstep is a distributed-tracing / observability SaaS (acquired by ServiceNow in 2021). It exposes "streams" — saved queries over trace data that downstream dashboards subscribe to — which can be panelled alongside metric panels in Grafana for side-by-side comparison.

Stub page created for cross-referencing from ; Lightstep's full architecture is not covered here.

Seen in

  • — Zalando built their ES 7→8 Blue/Green migration monitoring on Lightstep streams + Grafana dashboards to compare the old and new clusters side-by-side per endpoint: latency, error rate, CPU, memory, and — crucially for data-sync verification — index sizes and per-cluster deltas. Data-sync convergence (snapshot restore + shadow intake + stream reset) was verified by watching the index-size delta shrink to zero over the shadow window.
  • sources/2025-12-16-zalando-the-day-our-own-queries-dosed-us-inside-zalando-search — canonical wiki instance of Lightstep trace-exploration notebooks as the zebra-hunting tool on a self-inflicted-DoS incident. After first-line metric-altitude hypotheses were exhausted, "an exploratory analysis of traces in a Lightstep notebook detected an unusual traffic pattern from one of our internal applications. Further investigation revealed that the application was sending 50 times more queries than usual, and it matched the incident timeline exactly." The wider lesson is that trace-altitude per-caller cross-sections surface failures that metric-altitude aggregate dashboards hide — see concepts/zebra-not-horse-heuristic.
Last updated · 542 distilled / 1,571 read