Skip to content

SYSTEM Cited by 1 source

Arctic Wolf security telemetry table

Definition

The Arctic Wolf security telemetry table is [Arctic Wolf]'s production 3.8+ PB Delta Lake table on Unity Catalog managed tables ingesting 1+ trillion events per day, queried by threat hunters who depend on fresh data to detect active attacks. It is the largest single-table production case study disclosed on the wiki for Liquid Clustering migration, and validates the partition-to-Liquid conversion shape on a security-critical workload at petabyte scale.

The 2026-06-01 Databricks "Debunking 8 data layout myths" post canonicalises the case study verbatim:

"Arctic Wolf runs a 3.8+ PB security telemetry table ingesting 1+ trillion events per day, where threat hunters depend on fresh data to detect active attacks. After migrating from partitioning to Liquid Clustering on Unity Catalog managed tables with Predictive Optimization, Arctic Wolf saw: - 90-day queries drop from 51 seconds to 6.6 seconds - File count dropped from 4M to 2M - Data freshness improved from hours to minutes"

The post links to a separate detailed case study at "Arctic Wolf's Liquid Clustering Architecture: Tuned to Petabyte-Scale" which is not yet ingested on the wiki.

Operational envelope (disclosed)

Property Value
Table size 3.8+ PB
Daily event ingest 1+ trillion events/day
Workload class Security telemetry / threat hunting
Storage substrate Unity Catalog managed tables
Layout (post-migration) Liquid Clustering with Predictive Optimization
Layout (pre-migration) Hive partitioning

Migration outcomes

Metric Pre-migration Post-migration Improvement
90-day query latency 51 seconds 6.6 seconds 7.7× faster
File count 4 million 2 million −50%
Data freshness Hours Minutes Order-of-magnitude faster

The 7.7× query speedup on 90-day queries is the headline number for the post's case for partition→Liquid migration at petabyte scale. Threat hunters running 90-day queries on 3.8 PB of telemetry data went from 51-second wall clock to 6.6-second wall clock — fast enough for interactive iteration, which is structurally important for live attack investigation.

The file-count drop (4M → 2M) is direct evidence of the over-partitioning tax the partitioned layout was paying. Half the files; better per-file size; less metadata-listing overhead. The 2026-06-01 source places this in context with the broader 75%+ over-partitioning rate observation.

The data-freshness improvement (hours → minutes) reflects the combined benefit of the substrate properties: - Liquid Clustering's incremental-on-write layout maintenance (patterns/incremental-clustering-on-write) means new writes are immediately well-clustered. - Predictive Optimization handles compaction / VACUUM / stats collection automatically, removing the operational lag between ingest and query-readiness. - Unity Catalog managed tables provide the substrate that composes both.

Architecture composition

The Arctic Wolf case study is a worked example of the three- substrate-property combination:

┌─────────────────────────────────────────────────────────┐
│ Unity Catalog managed table                             │
│  - Substrate ownership of layout / optimization         │
│  - Default-on Predictive Optimization                   │
│  - Single source of truth for governance                │
└─────────────────────────────────────────────────────────┘
                           │ contains
┌─────────────────────────────────────────────────────────┐
│ Delta Lake table format                                 │
│  - Per-file min/max statistics in transaction log       │
│  - File-level data skipping                             │
│  - Atomic transaction semantics                         │
└─────────────────────────────────────────────────────────┘
                           │ uses
┌─────────────────────────────────────────────────────────┐
│ Liquid Clustering layout                                │
│  - Multi-dimensional clustering on filter columns       │
│  - Incremental clustering on write                      │
│  - Cardinality-flexible (handles high-card columns)     │
└─────────────────────────────────────────────────────────┘
                           │ maintained by
┌─────────────────────────────────────────────────────────┐
│ Predictive Optimization                                 │
│  - Auto-OPTIMIZE / VACUUM / stats collection            │
│  - Workload-aware scheduling                            │
│  - Default-on for managed tables                        │
└─────────────────────────────────────────────────────────┘

Attribution between the four substrate properties is not disclosed in the source — the headline numbers reflect the combined benefit of the four working together, not Liquid Clustering alone.

Why threat-hunting workloads make this case interesting

Security telemetry is structurally filter-heavy: every threat- hunting query is "events matching X across Y time range". The filter columns are typically high-cardinality (user_id, host_id, ip_address, hash, etc.) — the exact case Liquid Clustering is designed for and Hive partitioning fails on.

The pre-migration partitioned layout almost certainly couldn't use the high-cardinality filter columns as partition keys (would have produced billions of tiny files), so queries on those columns scanned wide swaths of partitions. Liquid Clustering on the high-cardinality filter columns produces tight per-file min/max ranges and effective file- level data skipping on each predicate.

Workload as time-pressure benchmark

The threat-hunting workload imposes a structural time pressure:

"threat hunters depend on fresh data to detect active attacks"

The hours → minutes data freshness improvement is operationally load-bearing — not a dashboard nicety. Detection latency on active attacks compounds with attacker dwell time; faster query freshness reduces the window in which attackers can act undetected.

Caveats

  • Attribution between substrate properties not disclosed. The 7.7× speedup reflects Liquid Clustering + Predictive Optimization
  • UC managed tables + Delta Lake transaction log working together. The contribution of each is not separately quantified.
  • Pre-migration partitioning column not disclosed. What Arctic Wolf was previously partitioned on, and why that choice produced the 4M files / 51s queries, is not in this source. (Possibly disclosed in the linked Arctic Wolf-specific blog post.)
  • Liquid Clustering keys not disclosed. What columns Arctic Wolf chose to cluster on (and whether CLUSTER BY AUTO was used) is not disclosed.
  • Migration mechanism not disclosed. Whether Arctic Wolf used REPLACE TABLE, dual writes + cutover, or in-place Liquid Conversion is not disclosed.
  • Concurrency / cost numbers not disclosed. No QPS / number of concurrent threat hunters / compute cost / storage cost data points beyond table size + ingest rate.

Seen in

Last updated · 542 distilled / 1,571 read