<@U054CEF80SC> BTW quick follow-up: ```daft.set_e...
# general
j
@Kyle BTW quick follow-up:
Copy code
daft.set_execution_config(parquet_split_row_groups_max_files=100)
You can use this flag to increase the number of files for which Daft will attempt to split reads. Especially if you have larger PQ files (e.g. your 10G ones) this will be useful. It will increase the amount of time Daft takes to generate the query plan, but you should see total number of partitions increase and Daft will split each file into multiple partitions
k
Great, thanks!!
j
Yeah it currently defaults to 10….
which maybe we should consider increasing
k
Cool! I think my bigger problem is that it's a little hard to understand which config param does what and what scenarios it can handle if we tweak those parameters.. For my scale of thousands of files I think the default threshold will unlikely be sufficient..
j
Yeah the config parameters are going to be more of a clutch. Ideally query engines should be able to do more intelligent things here 😕
k
Makes sense! 😄 My fiddling with the parameters never worked out well haha
j
Yeah we’ll think about it a bit more too. With some of the upcoming architectural changes to Daft this should hopefully be much less of an issue as we move towards a more streaming execution model
k
Nice! Looking forward haha