PATTERN Cited by 1 source

Low partition cardinality for large files¶

Keep the Iceberg partition spec coarse (day or hour granularity, low-cardinality fields) so that the flush threshold — not the partition count — controls file size. High partition cardinality multiplies the minimum file count per flush cycle, negating the benefit of raising the flush interval.

Pattern shape¶

Each partition produces at minimum one file per flush cycle. If: - Partition spec = (hour(timestamp)) → 24 partitions/day → 24 files/day minimum - Partition spec = (day(timestamp)) → 1 partition/day → 1 file/day minimum - Partition spec = (minute(timestamp)) → 1,440 partitions/day → 1,440 files/day minimum

For streaming workloads optimizing file size with bridge queries: - Use (day(redpanda.timestamp)) as partition spec - Let datalake_translator_flush_bytes (size threshold) be the dominant file-size control

When to apply¶

Streaming-to-Iceberg pipelines using bridge queries or delayed flush cadence
Workloads where query patterns use time-range filters that coarse partitions still serve
Any Iceberg table experiencing small-file-problem symptoms from partition explosion

Seen in¶

systems/redpanda-iceberg-topics — configurable via rpk topic alter-config --set redpanda.iceberg.partition.spec="(day(redpanda.timestamp))"
sources/2026-06-23-redpanda-bridge-queries-in-redpanda-sql — primary source