Gotcha — yeah this is a really large workload. Perhaps we should set up some time to chat about it? There are some things that we need to be doing here that we aren’t doing yet to work well with these workloads:
• Splitting of these 10G files into smaller partitions (we currently have a limit of 10 files for scan splitting because of metadata fetching overheads)
• When performing shuffles (joins, sorts, groupbys, repartitions), there will be a really large Ray graph being constructed. We are in the process of designing a new shuffle system to handle these larger shuffles.
LMK what timezone you’re operating out of? I’d love to get some of the Daft team involved and learn more about your workload to see if we can help stabilize it.