CONCEPT Cited by 1 source
Iceberg partition cardinality¶
Iceberg partition cardinality refers to the number of distinct partitions produced by an Iceberg table's partition spec. High cardinality directly multiplies the minimum file count: each partition produces at least one file per flush cycle, regardless of how large the flush threshold is set.
The multiplication problem¶
If the partition spec is (hour(timestamp)), you produce a minimum of 1 file per hour per partition. If you switch to (minute(timestamp)) or add a high-cardinality field (e.g. user_id), the file count explodes — undoing the benefit of raising the flush interval for large files (Source: sources/2026-06-23-redpanda-bridge-queries-in-redpanda-sql).
Practical guidance¶
- Default for Redpanda Iceberg Topics:
(hour(redpanda.timestamp))— produces 24 files/day minimum. - For workloads optimizing file size with bridge queries:
(day(redpanda.timestamp))— produces 1 file/day minimum per partition, allowing the translator flush threshold to be the dominant file-size control. - Avoid: minute-level or high-cardinality field partitioning for streaming sinks where file count matters.
Seen in¶
- systems/redpanda-iceberg-topics — configurable via
rpk topic alter-config --set redpanda.iceberg.partition.spec=... - sources/2026-06-23-redpanda-bridge-queries-in-redpanda-sql — primary source