jay
09/25/2024, 12:53 AMdaft.set_execution_config(parquet_split_row_groups_max_files=100)
You can use this flag to increase the number of files for which Daft will attempt to split reads. Especially if you have larger PQ files (e.g. your 10G ones) this will be useful. It will increase the amount of time Daft takes to generate the query plan, but you should see total number of partitions increase and Daft will split each file into multiple partitionsKyle
09/25/2024, 12:55 AMjay
09/25/2024, 12:58 AMjay
09/25/2024, 12:58 AMKyle
09/25/2024, 1:04 AMjay
09/25/2024, 1:04 AMKyle
09/25/2024, 1:07 AMjay
09/25/2024, 1:08 AMKyle
09/25/2024, 1:09 AM