If i find that my script is reading the parquets in too small units like the parquet is 1gb and the read/writes are coming in through kb (im guessing due to the auto streaming?) how can i adjust it back to a bigger chunk? I'm getting many errors due to throttling on the number of requests 🥲
j
jay
09/10/2024, 4:43 PM
Daft should perform coalescing of your red sizes to larger chunks of about a few MB by default!
How are you checking that it is reading kb at a time?
k
Kyle
09/10/2024, 11:46 PM
I had to reach out to another team running the s3 service and they said that my chunks were too small and that it was in kb. Not too sure how they checked that. Which parameters in the execution config (or s3 config) would influence this?
j
jay
09/11/2024, 12:11 AM
Are you running against AWS S3 or your own S3 service? The Daft readers are optimized against AWS so might require a little bit of tuning when running against your own service!
k
Kyle
09/11/2024, 12:16 AM
I see.. thank you!
Kyle
09/11/2024, 12:19 AM
They recommended that I specify the multipart_chunksize to the s3. Would there be an equivalent config in daft?