Skip to content

SYSTEM Cited by 1 source

Daft

Daft (github.com/Eventual-Inc/Daft) is a Python + Rust distributed DataFrame library optimised for multimodal and columnar data on cloud object storage. Ships its own Rust-based Parquet + I/O stack and integrates with systems/ray as a runtime. Its S3 Parquet + delimited-text reader is a notable performance win over PyArrow and S3Fs for this class of workload.

Benchmarked performance (Amazon BDT, 2024)

Joint work with Amazon's BDT team produced a 24% production cost efficiency improvement on Ray compaction when Daft replaced the prior I/O stack. Microbenchmarks from the same post:

Operation Daft vs PyArrow Daft vs S3Fs
Median single-column read −55% −91%
Median full-file read −19% −77%

(Source: sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2)

Seen in

Last updated · 200 distilled / 1,178 read