CONCEPT Cited by 1 source
Decision-support vs audit query¶
Definition¶
A decision-support query is an analytical query whose answer only needs to be good enough to drive the same decision as the exact answer. An audit query, by contrast, has external consumers (auditors, regulators, billing systems, customers) that require bit-exact precision. Classifying a query as one or the other determines whether it is safe to trade exactness for compute — the prerequisite for using probabilistic data structures at all.
The canonical articulation comes from Databricks' 2026-04-29 post:
"Many analytical questions are decision-support, not audit. If knowing '~4.7M unique users ±1%' leads to the same decision as '4,712,389 unique users,' the approximate answer at a fraction of the cost is strictly better."
(Source: sources/2026-04-29-databricks-approximate-answers-exact-decisions-new-sketch-functions-for-analytics)
The test¶
A query is a decision-support query if:
- The consumer of the result is a human reading a dashboard or a model reading a feature.
- The decision made off the result is coarse enough that 1–2% error would not change it.
- The result is not a legal or contractual commitment.
- The result is not directly surfaced to an external party (customer, auditor, regulator).
A query is an audit query if:
- The result is a line item in a financial report.
- The result drives regulatory reporting (GDPR deletion counts, SEC filings, compliance dashboards).
- The result is a billing line presented to a customer.
- The result is a legal artefact (discovery, reconciliation).
- Precision is part of the SLA.
Why the distinction matters architecturally¶
The distinction determines which storage primitives apply:
| Query class | Storage primitive | Query cost |
|---|---|---|
| Decision-support | Precomputed sketches (KLL / Theta / Top-K / Tuple) | merge sketches on read — milliseconds |
| Audit | Raw event log + exact GROUP BY + nightly reconciliation |
full scan — minutes to hours |
Databricks' framing (Source: sources/2026-04-29-databricks-approximate-answers-exact-decisions-new-sketch-functions-for-analytics):
"When to use sketches: Dashboards, trend analysis, monitoring, marketing attribution — any query where approximate answers are acceptable. When to stay exact: Financial auditing, compliance reporting, or any use case where regulatory or business requirements demand precise values."
The architectural move is to serve the dashboard from sketches while the raw data stays on disk for when an auditor asks:
"The raw data is still there when the auditors ask. For everything else, a 1% error margin and a 1000x speedup is a welcome trade-off."
The antipattern: treating every query as audit¶
The common failure mode is to treat all analytics as audit-grade because some analytics is. This leads to:
- Dashboards that take minutes to refresh because they scan raw data to get percentiles.
- ETL pipelines that shuffle billions of user IDs across the cluster nightly to produce a unique-user count that feeds a single dashboard tile.
- Multi-minute query latencies on high-cardinality top-K that the product uses for "trending now".
These workloads are decision-support queries dressed up as audit queries. The fix is not to make exact queries faster (they're fundamentally bound by full-scan / global-sort cost) — it's to classify the query correctly and switch to a sketch.
The antipattern on the other side¶
Equally bad: treating an audit query as decision-support because "1% is close enough". This produces:
- Revenue reports that reconcile ±1% with the books.
- Billing invoices that don't add up.
- Compliance reports where the numbers drift from the source of truth.
The Databricks post is explicit about this: "financial auditing, compliance reporting" must stay exact.
Seen in¶
- sources/2026-04-29-databricks-approximate-answers-exact-decisions-new-sketch-functions-for-analytics — canonical articulation of the dichotomy and its architectural consequence (sketches for decision-support, exact for audit). The post uses the "~4.7M unique users ±1% vs. 4,712,389" example as the intuition pump.
Related¶
- concepts/probabilistic-data-structure — the family that decision-support queries can use.
- concepts/mergeable-sketch — the property that makes sketch-backed dashboards viable.
- concepts/kll-quantile-sketch
- concepts/theta-sketch
- concepts/approximate-top-k-sketch
- concepts/tuple-sketch
- concepts/oltp-vs-olap — adjacent classification of query shape.
- patterns/precomputed-sketch-column-in-delta-table — the concrete pattern for serving decision-support queries.