Community for the Daft project and all things distributed data

Distributed Data Community

Hey there! Loved the overall concept of Daft. :slightly_smiling_face:
Basically in order to completely move away from Spark and just to maintain data lakes using Daft, there is a need of following :  <https://iceberg.apache.org/docs/1.5.1/maintenance/>

Is there any way we can do this?

These look like Iceberg-specific table metadata operations, we do not have specific plans to support them at this moment. Would the Iceberg Java API (from your link) or <https://py.iceberg.apache.org/|PyIceberg> suit your needs for that?

It does seem like most of these operations are fairly cheap metadata-only ops!

With the exception of `Compact data files`, I’m guessing that PyIceberg would be able to support many of them simply from a single machine without needing a cluster…

Yes `Compact data files` is supported, Daft can be used to develop mature data engineering pipelines. without spark

Would you like to draft an issue for us to do `Compact data files`? :slightly_smiling_face:

Also wondering if you were able to find a solution for your other maintenance needs in PyIceberg?