CONCEPT Cited by 1 source

KLL quantile sketch¶

Definition¶

A KLL quantile sketch is a streaming probabilistic data structure that produces rank-error-bounded quantile / percentile estimates over a stream of numeric values, using bounded memory regardless of input size. It is part of the Apache DataSketches family.

The name comes from the algorithm's authors (Karnin, Lang, and Liberty, 2016). Given a stream of N numeric values:

Exact PERCENTILE(col, 0.99) requires a global sort over all N values (O(N) memory, full cluster shuffle for distributed input).
A KLL sketch summarises the stream in configurable bounded memory and answers quantile queries with typical 1–2% relative rank error.
A single sketch supports extracting multiple quantiles in one pass — e.g. P50 / P90 / P99 from one sketch.

Why KLL specifically¶

KLL sketches provide:

Rank-error guarantees (not value-error) — the returned percentile is within a bounded rank of the true percentile.
Mergeability — KLL sketches form a mergeable sketch semigroup; per- partition or per-hour sketches can be merged into a global sketch losslessly (modulo the error envelope).
Bounded memory that does not grow with N — the key property that distinguishes sketches from sampling.

This is distinct from the sibling DDSketch family (Datadog lineage), which provides relative-error quantiles instead of rank-error quantiles. Both are in use; KLL is the DataSketches-native choice.

Canonical use cases¶

Databricks' 2026-04-29 post lists:

Latency monitoring — P50 / P90 / P99 of response_time_ms across billions of events.
Capacity planning — quantiles of per-request CPU / memory cost.
Anomaly detection — quantile-based thresholds on streaming metrics.

The workflow that makes KLL more than just "a faster percentile" is the ETL+dashboard split (see patterns/precomputed-sketch-column-in-delta-table): build one KLL sketch per partition per hour during ETL, store as a BLOB column in Delta, and merge on read when a dashboard asks for P50/P90/P99 over an arbitrary time range.

The API shape¶

Databricks exposes KLL via (Source: sources/2026-04-29-databricks-approximate-answers-exact-decisions-new-sketch-functions-for-analytics):

kll_sketch_agg_double — aggregate-function form; build a sketch over a column in a GROUP BY or scan.
kll_get_quantile_bigint(sketch, ARRAY(0.5, 0.9, 0.99)) — extract multiple quantiles in one call from one sketch.

Because the sketch is mergeable, the same sketch built per hour can be scanned over any time window and merged at query time.

Seen in¶

sources/2026-04-29-databricks-approximate-answers-exact-decisions-new-sketch-functions-for-analytics — Databricks SQL / DataFrame / Structured Streaming expose KLL sketches as a first-class aggregate. Canonical framing: "built to answer quantile questions. They let you replace this sort while using the same bounded memory, whether you process a thousand values or a trillion." The example in the post is P50 / P90 / P99 for a dashboard refreshed every 5 minutes — where exact PERCENTILE requires rescanning billions of rows per refresh, KLL lets you merge precomputed sketches in milliseconds.

concepts/ddsketch-error-bounded-percentile — sibling quantile-sketch family with relative-error (not rank-error) guarantees; Datadog / PlanetScale Insights lineage.
concepts/mergeable-sketch — the property that makes KLL composable across partitions and time windows.
concepts/sketch-as-mysql-binary-column — parallel serialised-sketch-as-column pattern in OLTP engines.
concepts/decision-support-vs-audit-query — the framing that justifies trading 1–2% error for orders-of-magnitude compute reduction.
systems/apache-datasketches — the underlying library.
systems/databricks — first Databricks ingest exposing KLL as a SQL aggregate.