On a whim I decided to run Daft on the Polars TPC-...
# general
j
On a whim I decided to run Daft on the Polars TPC-H benchmark repo and was initially horrified 🥲. • Then I realized there was a bug in Polars’ Parquet file generation code! • Polars was somehow generating extremely fragmented Parquet files with tiny rowgroups: https://github.com/pola-rs/tpch/issues/123 • And it seems our Daft Parquet reader somehow reeeeally sucked at reading these fragmented rowgroups. That’s probably a bug/optimization we need to fix.... --- But anyhow, I ran this on my M2 macbook air and the results actually look really good! I think we’ve accidentally made a really fast local data engine that can also run distributed…