Skip to content

CONCEPT Cited by 1 source

KLL quantile sketch

Definition

A KLL quantile sketch is a streaming probabilistic data structure that produces rank-error-bounded quantile / percentile estimates over a stream of numeric values, using bounded memory regardless of input size. It is part of the Apache DataSketches family.

The name comes from the algorithm's authors (Karnin, Lang, and Liberty, 2016). Given a stream of N numeric values:

  • Exact PERCENTILE(col, 0.99) requires a global sort over all N values (O(N) memory, full cluster shuffle for distributed input).
  • A KLL sketch summarises the stream in configurable bounded memory and answers quantile queries with typical 1–2% relative rank error.
  • A single sketch supports extracting multiple quantiles in one pass — e.g. P50 / P90 / P99 from one sketch.

Why KLL specifically

KLL sketches provide:

  • Rank-error guarantees (not value-error) — the returned percentile is within a bounded rank of the true percentile.
  • Mergeability — KLL sketches form a mergeable sketch semigroup; per- partition or per-hour sketches can be merged into a global sketch losslessly (modulo the error envelope).
  • Bounded memory that does not grow with N — the key property that distinguishes sketches from sampling.

This is distinct from the sibling DDSketch family (Datadog lineage), which provides relative-error quantiles instead of rank-error quantiles. Both are in use; KLL is the DataSketches-native choice.

Canonical use cases

Databricks' 2026-04-29 post lists:

  • Latency monitoring — P50 / P90 / P99 of response_time_ms across billions of events.
  • Capacity planning — quantiles of per-request CPU / memory cost.
  • Anomaly detection — quantile-based thresholds on streaming metrics.

The workflow that makes KLL more than just "a faster percentile" is the ETL+dashboard split (see patterns/precomputed-sketch-column-in-delta-table): build one KLL sketch per partition per hour during ETL, store as a BLOB column in Delta, and merge on read when a dashboard asks for P50/P90/P99 over an arbitrary time range.

The API shape

Databricks exposes KLL via (Source: sources/2026-04-29-databricks-approximate-answers-exact-decisions-new-sketch-functions-for-analytics):

  • kll_sketch_agg_double — aggregate-function form; build a sketch over a column in a GROUP BY or scan.
  • kll_get_quantile_bigint(sketch, ARRAY(0.5, 0.9, 0.99)) — extract multiple quantiles in one call from one sketch.

Because the sketch is mergeable, the same sketch built per hour can be scanned over any time window and merged at query time.

Seen in

  • sources/2026-04-29-databricks-approximate-answers-exact-decisions-new-sketch-functions-for-analytics — Databricks SQL / DataFrame / Structured Streaming expose KLL sketches as a first-class aggregate. Canonical framing: "built to answer quantile questions. They let you replace this sort while using the same bounded memory, whether you process a thousand values or a trillion." The example in the post is P50 / P90 / P99 for a dashboard refreshed every 5 minutes — where exact PERCENTILE requires rescanning billions of rows per refresh, KLL lets you merge precomputed sketches in milliseconds.
Last updated · 438 distilled / 1,268 read