Skip to content

CONCEPT Cited by 1 source

ClickHouse TTL policy

Definition

A ClickHouse TTL policy is a table- or column-level clause that declaratively specifies when rows or columns should be automatically deleted or moved to a different storage tier based on a time expression. The policy is evaluated in the background by the MergeTree merge process — there is no scheduled sweeper or external job.

Canonical declaration from the 2025-12-09 Redpanda IoT pipeline post:

ALTER TABLE telemetry_events
    MODIFY TTL timestamp + INTERVAL 12 MONTH DELETE;

Verbatim: "ClickHouse also supports native TTL policies, which allow the automatic deletion or movement of old data. For instance, if you wanted to delete rows older than 12 months, you would use this statement." (Source: sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse)

What TTL can do

  • DELETE (default) — drop rows whose TTL expression is in the past. When combined with time- partitioning, this becomes whole-partition drops — O(1) rather than row-level scan.
  • TO DISK 'cold' / TO VOLUME 'archive' — move rows from one storage tier to another (hot SSD → cold S3) without deleting them. The foundation of hot-cold compression tiering when combined with per-column codec choices.
  • Column-level TTL — expire individual columns (e.g. PII) earlier than the row itself. <column> TTL <expression> replaces the column with its default value at expiry.
  • Aggregation TTLGROUP BY … TTL <expression> lets older data roll up into coarser-grained aggregates rather than being deleted entirely.

Why TTL instead of app-side deletion

  • Declarative — retention is part of the schema, not a cron job that can be forgotten or misconfigured.
  • Integrated with the merge process — no separate scan path; TTL evaluation piggybacks on the background work MergeTree is doing anyway.
  • No DELETE statement cost — ClickHouse's ALTER TABLE ... DELETE is an expensive mutation; TTL avoids it by scheduling data loss at merge time on a whole-part basis.
  • Survives operator turnover — the policy lives in the table definition; a new operator inherits the retention contract automatically.

Composes with

Caveats

  • TTL is lazy, not scheduled. Expired data is physically removed only when the next merge happens on its part. Storage reclamation lag can be minutes to hours.
  • OPTIMIZE TABLE … FINAL can force immediate TTL evaluation but is expensive and blocks other work.
  • TTL-based DELETE is not compliance-grade erasure. For GDPR / deletion-certificate workloads, you may need explicit ALTER TABLE … DELETE WHERE + validation.
  • Row-level TTL on a non-partitioned table does scan all parts; the full speedup requires partition alignment.

Seen in

Last updated · 470 distilled / 1,213 read