Skip to content

PATTERN Cited by 1 source

Time-partitioned MergeTree for time series

Problem

Store high-volume time-series data (IoT telemetry, access logs, audit events, CI/CD metrics) in a columnar analytical store with:

  • Fast queries across months/years of history — sub- second analytics on billions of rows;
  • Efficient retention management — drop old data without scanning it;
  • Optional cold-tier archival — move aged data to object storage without re-ingesting;
  • Efficient compression — high compression ratios on aged data; low CPU cost on hot data.

Pattern

Use the ClickHouse MergeTree engine with PARTITION BY toYYYYMM(timestamp) (or toDate(timestamp)) as the table shape for time-series data, then compose:

  • ORDER BY (entity_id, timestamp) for granule-level skipping on per-entity time-range queries;
  • TTL timestamp + INTERVAL N MONTH DELETE for automatic eviction;
  • TTL … TO VOLUME 'cold' for automatic tier migration;
  • Column-level CODEC(LZ4) on hot partitions / CODEC(ZSTD) on cold for per-tier compression choice;
  • ALTER TABLE … DETACH PARTITION as the manual archival escape hatch for partitions outside the TTL policy.

Canonical shape from the 2025-12-09 Redpanda IoT pipeline post:

CREATE TABLE telemetry_events (
    timestamp DateTime,
    device_id String,
    value   Float64 CODEC(ZSTD)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (device_id, timestamp);

ALTER TABLE telemetry_events
    MODIFY TTL timestamp + INTERVAL 12 MONTH DELETE;

ALTER TABLE telemetry_events
    DETACH PARTITION '202501';  -- optional manual archival

(Source: sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse)

Why it works

  • Time is naturally unbounded, so time-partitioning scales with the data: the partition count is bounded only by the retention horizon divided by partition granularity.
  • Time predicates are the most common query shape on time-series data — partition pruning + granule skipping compose for "recent time window on specific entity" queries answered in single-digit milliseconds.
  • Retention and archival align with partitions — dropping the oldest month is an O(1) metadata operation, not a row scan.
  • Compression ratio varies over the data's age — hot rows get queried a lot (lightweight codec), cold rows rarely (aggressive codec). The pattern accommodates both by allowing codec change at the tier-move step.

Choosing partition granularity

  • Monthly (toYYYYMM) — retention in months-to-years; dashboards with monthly rollups; lower partition count (faster background merges). Default choice for telemetry.
  • Daily (toDate) — retention in days-to-weeks; dashboards with "last 24 h" queries; archival cadence is daily. Default choice for logs.
  • Hourly (toStartOfHour) — rarely worth it; partition metadata overhead dominates.

Composes with

Caveats

  • Partition choice is load-bearing and irreversible without rewriting the table. Get it right at table creation.
  • ORDER BY is the second load-bearing choice. Without entity prefix, per-entity queries fall back to full-scan within partitions.
  • Insert must already be time-ordered or nearly so — out-of-order inserts trigger merges that eat into ingest throughput.
  • Cold-tier query requires tooling — detached partitions on S3 need an S3 table function or external Iceberg layer to be queryable without re-attach.
  • TTL is lazy — expired rows persist until next merge. Reclamation is eventual, not instant.

Seen in

Last updated · 470 distilled / 1,213 read