SYSTEM Cited by 1 source
Apache Hudi¶
Apache Hudi (Hadoop Upserts Deletes and Incrementals) is one of the three canonical open table formats alongside systems/apache-iceberg and systems/delta-lake — all layered over systems/apache-parquet on object storage to provide transactional row-level inserts, updates, deletes, and schema evolution on top of immutable objects.
Minimum viable framing for this wiki: same structural role as Iceberg — see concepts/open-table-format for the shared shape. Hudi's distinguishing emphasis is near-real-time upsert ingestion with two on-disk storage modes: Copy-on-Write (concepts/copy-on-write-merge) and Merge-on-Read (read-time merge of a base file + a log of delta records).
Seen in¶
- sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2 — named alongside Iceberg and Delta Lake as the open table formats systems/deltacat (and therefore the Amazon BDT Flash Compactor architecture) is designed to extend to.
Related¶
- systems/apache-iceberg, systems/delta-lake — peer OTFs.
- systems/apache-parquet — base data-file format.
- concepts/open-table-format — the category these formats occupy.
- concepts/copy-on-write-merge — one of Hudi's two storage-mode strategies.
- concepts/change-data-capture — Hudi's canonical upstream workload is CDC from an OLTP source.