PATTERN Cited by 1 source
Time-partitioned MergeTree for time series¶
Problem¶
Store high-volume time-series data (IoT telemetry, access logs, audit events, CI/CD metrics) in a columnar analytical store with:
- Fast queries across months/years of history — sub- second analytics on billions of rows;
- Efficient retention management — drop old data without scanning it;
- Optional cold-tier archival — move aged data to object storage without re-ingesting;
- Efficient compression — high compression ratios on aged data; low CPU cost on hot data.
Pattern¶
Use the ClickHouse MergeTree engine with PARTITION BY
toYYYYMM(timestamp) (or toDate(timestamp)) as the table
shape for time-series data, then compose:
ORDER BY (entity_id, timestamp)for granule-level skipping on per-entity time-range queries;TTL timestamp + INTERVAL N MONTH DELETEfor automatic eviction;TTL … TO VOLUME 'cold'for automatic tier migration;- Column-level
CODEC(LZ4)on hot partitions /CODEC(ZSTD)on cold for per-tier compression choice; ALTER TABLE … DETACH PARTITIONas the manual archival escape hatch for partitions outside the TTL policy.
Canonical shape from the 2025-12-09 Redpanda IoT pipeline post:
CREATE TABLE telemetry_events (
timestamp DateTime,
device_id String,
value Float64 CODEC(ZSTD)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (device_id, timestamp);
ALTER TABLE telemetry_events
MODIFY TTL timestamp + INTERVAL 12 MONTH DELETE;
ALTER TABLE telemetry_events
DETACH PARTITION '202501'; -- optional manual archival
(Source: sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse)
Why it works¶
- Time is naturally unbounded, so time-partitioning scales with the data: the partition count is bounded only by the retention horizon divided by partition granularity.
- Time predicates are the most common query shape on time-series data — partition pruning + granule skipping compose for "recent time window on specific entity" queries answered in single-digit milliseconds.
- Retention and archival align with partitions — dropping the oldest month is an O(1) metadata operation, not a row scan.
- Compression ratio varies over the data's age — hot rows get queried a lot (lightweight codec), cold rows rarely (aggressive codec). The pattern accommodates both by allowing codec change at the tier-move step.
Choosing partition granularity¶
- Monthly (
toYYYYMM) — retention in months-to-years; dashboards with monthly rollups; lower partition count (faster background merges). Default choice for telemetry. - Daily (
toDate) — retention in days-to-weeks; dashboards with "last 24 h" queries; archival cadence is daily. Default choice for logs. - Hourly (
toStartOfHour) — rarely worth it; partition metadata overhead dominates.
Composes with¶
- patterns/clickhouse-plus-snowflake-dual-storage-tier — if the same event stream also lands in Snowflake for real-time analytics, ClickHouse's time-partitioned MergeTree is the long-term archive half of that dual-tier pattern.
Caveats¶
- Partition choice is load-bearing and irreversible without rewriting the table. Get it right at table creation.
ORDER BYis the second load-bearing choice. Without entity prefix, per-entity queries fall back to full-scan within partitions.- Insert must already be time-ordered or nearly so — out-of-order inserts trigger merges that eat into ingest throughput.
- Cold-tier query requires tooling — detached partitions on S3 need an S3 table function or external Iceberg layer to be queryable without re-attach.
- TTL is lazy — expired rows persist until next merge. Reclamation is eventual, not instant.
Seen in¶
- sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse — canonical wiki introduction. Redpanda IoT-pipeline tutorial post names time-partitioned MergeTree as the canonical ClickHouse schema for IoT telemetry, audit logs, and long-term event storage, with TTL + detach + codec composition.