Question for the daft team — when using daft.read_...
# daft-dev
d
Question for the daft team — when using daft.read_iceberg(…) to read a parquet-formed iceberg table containing a partitionspec and catalogued in a Hive Metastore, will daft create a dataframe partitioned similar to the source table? I got the error below which suggests that daft is confused by the table’s partitionspec. (I’m using ray 2.8.1, Python 3.8, pyiceberg 0.4.0, pyarrow 15.0.0, thrift 0.16.0, getdaft 0.3.2, pandas 1.5.3) The offending code is: df = daft.read_iceberg(tbl_iceberg) print(f"num_daft_partitions: {df.num_partitions()}") ———- Traceback (most recent call last):   File "huggingface_ts.py", line 756, in <module>     print(f"num_daft_partitions: {df.num_partitions()}")   File "/tmp/ray/session_2024-09-11_10-50-57_989820_8/runtime_resources/pip/ccd6546b9db45e126cc6cf4dab015ec053a14fb1/virtualenv/lib/python3.8/site-packages/daft/dataframe/dataframe.py", line 194, in num_partitions     return self.__builder.optimize().to_physical_plan_scheduler(daft_execution_config).num_partitions()   File "/tmp/ray/session_2024-09-11_10-50-57_989820_8/runtime_resources/pip/ccd6546b9db45e126cc6cf4dab015ec053a14fb1/virtualenv/lib/python3.8/site-packages/daft/logical/builder.py", line 67, in to_physical_plan_scheduler     return PhysicalPlanScheduler.from_logical_plan_builder(   File "/tmp/ray/session_2024-09-11_10-50-57_989820_8/runtime_resources/pip/ccd6546b9db45e126cc6cf4dab015ec053a14fb1/virtualenv/lib/python3.8/site-packages/daft/plan_scheduler/physical_plan_scheduler.py", line 30, in from_logical_plan_builder     scheduler = _PhysicalPlanScheduler.from_logical_plan_builder(builder._builder, daft_execution_config)   File "/tmp/ray/session_2024-09-11_10-50-57_989820_8/runtime_resources/pip/ccd6546b9db45e126cc6cf4dab015ec053a14fb1/virtualenv/lib/python3.8/site-packages/daft/iceberg/iceberg_scan.py", line 173, in to_scan_tasks     pspec = self._iceberg_record_to_partition_spec(self._table.specs()[file.spec_id], file.partition) KeyError: None