PATTERN Cited by 1 source
Low partition cardinality for large files¶
Keep the Iceberg partition spec coarse (day or hour granularity, low-cardinality fields) so that the flush threshold — not the partition count — controls file size. High partition cardinality multiplies the minimum file count per flush cycle, negating the benefit of raising the flush interval.
Pattern shape¶
Each partition produces at minimum one file per flush cycle. If:
- Partition spec = (hour(timestamp)) → 24 partitions/day → 24 files/day minimum
- Partition spec = (day(timestamp)) → 1 partition/day → 1 file/day minimum
- Partition spec = (minute(timestamp)) → 1,440 partitions/day → 1,440 files/day minimum
For streaming workloads optimizing file size with bridge queries:
- Use (day(redpanda.timestamp)) as partition spec
- Let datalake_translator_flush_bytes (size threshold) be the dominant file-size control
When to apply¶
- Streaming-to-Iceberg pipelines using bridge queries or delayed flush cadence
- Workloads where query patterns use time-range filters that coarse partitions still serve
- Any Iceberg table experiencing small-file-problem symptoms from partition explosion
Seen in¶
- systems/redpanda-iceberg-topics — configurable via
rpk topic alter-config --set redpanda.iceberg.partition.spec="(day(redpanda.timestamp))" - sources/2026-06-23-redpanda-bridge-queries-in-redpanda-sql — primary source