Skip to content

CONCEPT Cited by 1 source

Iceberg partition cardinality

Iceberg partition cardinality refers to the number of distinct partitions produced by an Iceberg table's partition spec. High cardinality directly multiplies the minimum file count: each partition produces at least one file per flush cycle, regardless of how large the flush threshold is set.

The multiplication problem

If the partition spec is (hour(timestamp)), you produce a minimum of 1 file per hour per partition. If you switch to (minute(timestamp)) or add a high-cardinality field (e.g. user_id), the file count explodes — undoing the benefit of raising the flush interval for large files (Source: sources/2026-06-23-redpanda-bridge-queries-in-redpanda-sql).

Practical guidance

  • Default for Redpanda Iceberg Topics: (hour(redpanda.timestamp)) — produces 24 files/day minimum.
  • For workloads optimizing file size with bridge queries: (day(redpanda.timestamp)) — produces 1 file/day minimum per partition, allowing the translator flush threshold to be the dominant file-size control.
  • Avoid: minute-level or high-cardinality field partitioning for streaming sinks where file count matters.

Seen in

Last updated · 559 distilled / 1,651 read