SYSTEM Cited by 1 source
Husky¶
Husky is Datadog's distributed observability event store, layered over cloud object storage (S3 / GCS / Azure Blob) with the query system acting as a cache over that storage. It ingests trillions of events per day and targets the observability read pattern: write-heavy (few-to-no updates of existing rows), queries biased toward young data but with occasional analytical scans over billions of events.
Datadog has published a three-part series: introducing Husky, ingestion deep-dive, and the compaction post ingested here (Source: sources/2025-01-29-datadog-husky-efficient-compaction-at-datadog-scale).
Data model¶
- Event → row in a table. Tables are partitioned by time window buckets.
- Rows within a (table, time-window) bucket live in fragments — immutable objects in object storage.
- Each fragment uses a custom columnar format loosely modeled on systems/apache-parquet but specially designed for observability data: one row group per column header, column headers inline so readers can stream-discover columns, fixed uniform row-group size across columns within a fragment, embedded column-position skip list. Fragments can carry hundreds of thousands (to millions) of columns after compaction. See concepts/columnar-storage-format.
- Each fragment is sorted by a sorting schema = zero or more
user-chosen columns +
timestamp, maintained as an invariant by both ingest and compaction paths. Typical log schema:service, status, env, timestamp.
Metadata repository¶
Fragment metadata (including the per-column trimmed-regex pruning predicate and the fragment's min/max row keys) lives in a systems/foundationdb-backed metadata repository. FDB transactions give Husky the atomic "swap old fragment set for new fragment" guarantee that compaction needs — queries see the table state either entirely before or entirely after a compaction, never a half-applied one.
Write path¶
- Writers buffer events per tenant, cap buffer age, flush one fragment per buffer to object storage.
- Writers sort the events by the sorting-schema row key, write the fragment, and record min/max row keys + per-sort-column pruning FSA in the fragment header (and metadata).
Compaction (the 2025 post's subject)¶
Compaction is Husky's central cost lever: query cost is proportional to (fragments fetched from object storage) × (events scanned per fragment), so compaction attacks both terms.
Two cooperating compaction strategies (hybrid LSM — see concepts/lsm-compaction):
- Size-tiered compaction. New small fragments enter at size class 0. Within a class, a batch is k-way merged into the next class — eventually producing ~1M-row fragments. Compaction is lazy: Husky waits until "a few dozen fragments" are ready to merge, which Datadog reports is an order-of-magnitude cheaper in CPU + object-store API calls vs. naïve eager compaction.
- Locality compaction. Sits above size-tiering. Treats fragments as nodes in exponentially-growing levels L0..Ln; for each level, overlapping row-key ranges are k-way merged into disjoint outputs and promoted once the level's row-limit is exceeded. Higher levels therefore hold narrower row-key ranges per fragment → more prunable.
Both strategies use the same streaming k-way merge primitive —
one GET per input fragment, streamed in parallel, one column × one
row group at a time, bounded memory (row-group-size × input count),
saturated CPU. See patterns/streaming-k-way-merge. Row-group
size is chosen adaptively against the heaviest input column (logs
can carry 75-KiB message values; millions of events/fragment → a
single column's row group can exceed 1 GiB without bounding).
Query path: fragment pruning¶
For every candidate fragment in the (table, time-window) buckets intersected by a query, Husky evaluates the fragment's stored per-sort-column regex (a mechanical projection of a "trimmed" finite-state automaton built at fragment-write time over all values in the column) against the query's predicates. If the regex rejects the predicate, the fragment is skipped without a GET. This is far more precise than a min/max row-key filter, particularly for non-leading sort-column predicates.
Worked example from the post: a fragment whose service column
contains compactor,reader,writer stores regex
^(compactor|reader|writer)$. A query for service:metadata is
skipped — even though metadata lexicographically sorts inside
[compactor,writer], so a range-only filter would wrongly scan it.
See patterns/trimmed-automaton-predicate-filter,
concepts/fragment-pruning.
Reported scale / outcomes¶
- Ingest: trillions of events per day.
- Compaction throughput: "thousands of fragments and dozens of GB of data every second".
- Locality-compaction rollout impact: 30% reduction in the query-worker fleet — Husky's largest cost centre — configured conservatively to use no more CPU than size-tiering alone.
- "These savings get larger every year as the usage of our system grows."
Not covered here (deferred)¶
- Query-execution details: latency-hiding over object storage, avoiding re-execution of the same query.
- Ingestion internals (covered in the earlier deep-dive).
- Numeric thresholds for size-tier boundaries, level row-limits, row-group adaptive sizing heuristic.
Seen in¶
- sources/2025-01-29-datadog-husky-efficient-compaction-at-datadog-scale — the compaction article: columnar fragment format, hybrid size-tiered
- locality LSM, trimmed-FSA regex pruning, FoundationDB atomic fragment swap, 30% query-fleet cost reduction from locality compaction.