PATTERN Cited by 1 source

Time-partitioned MergeTree for time series¶

Problem¶

Store high-volume time-series data (IoT telemetry, access logs, audit events, CI/CD metrics) in a columnar analytical store with:

Fast queries across months/years of history — sub- second analytics on billions of rows;
Efficient retention management — drop old data without scanning it;
Optional cold-tier archival — move aged data to object storage without re-ingesting;
Efficient compression — high compression ratios on aged data; low CPU cost on hot data.

Pattern¶

Use the ClickHouse MergeTree engine with PARTITION BY toYYYYMM(timestamp) (or toDate(timestamp)) as the table shape for time-series data, then compose:

ORDER BY (entity_id, timestamp) for granule-level skipping on per-entity time-range queries;
TTL timestamp + INTERVAL N MONTH DELETE for automatic eviction;
TTL … TO VOLUME 'cold' for automatic tier migration;
Column-level CODEC(LZ4) on hot partitions / CODEC(ZSTD) on cold for per-tier compression choice;
ALTER TABLE … DETACH PARTITION as the manual archival escape hatch for partitions outside the TTL policy.

Canonical shape from the 2025-12-09 Redpanda IoT pipeline post:

CREATE TABLE telemetry_events (
    timestamp DateTime,
    device_id String,
    value   Float64 CODEC(ZSTD)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (device_id, timestamp);

ALTER TABLE telemetry_events
    MODIFY TTL timestamp + INTERVAL 12 MONTH DELETE;

ALTER TABLE telemetry_events
    DETACH PARTITION '202501';  -- optional manual archival

(Source: sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse)

Why it works¶

Time is naturally unbounded, so time-partitioning scales with the data: the partition count is bounded only by the retention horizon divided by partition granularity.
Time predicates are the most common query shape on time-series data — partition pruning + granule skipping compose for "recent time window on specific entity" queries answered in single-digit milliseconds.
Retention and archival align with partitions — dropping the oldest month is an O(1) metadata operation, not a row scan.
Compression ratio varies over the data's age — hot rows get queried a lot (lightweight codec), cold rows rarely (aggressive codec). The pattern accommodates both by allowing codec change at the tier-move step.

Choosing partition granularity¶

Monthly (toYYYYMM) — retention in months-to-years; dashboards with monthly rollups; lower partition count (faster background merges). Default choice for telemetry.
Daily (toDate) — retention in days-to-weeks; dashboards with "last 24 h" queries; archival cadence is daily. Default choice for logs.
Hourly (toStartOfHour) — rarely worth it; partition metadata overhead dominates.

Composes with¶

patterns/clickhouse-plus-snowflake-dual-storage-tier — if the same event stream also lands in Snowflake for real-time analytics, ClickHouse's time-partitioned MergeTree is the long-term archive half of that dual-tier pattern.

Caveats¶

Partition choice is load-bearing and irreversible without rewriting the table. Get it right at table creation.
ORDER BY is the second load-bearing choice. Without entity prefix, per-entity queries fall back to full-scan within partitions.
Insert must already be time-ordered or nearly so — out-of-order inserts trigger merges that eat into ingest throughput.
Cold-tier query requires tooling — detached partitions on S3 need an S3 table function or external Iceberg layer to be queryable without re-attach.
TTL is lazy — expired rows persist until next merge. Reclamation is eventual, not instant.

Seen in¶

sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse — canonical wiki introduction. Redpanda IoT-pipeline tutorial post names time-partitioned MergeTree as the canonical ClickHouse schema for IoT telemetry, audit logs, and long-term event storage, with TTL + detach + codec composition.