If ScanWithTask is requesting for memory 37407051360 CPU 1 b Distributed Data Community #general

If ScanWithTask is requesting for { "memory": 3740...

Kyle

09/23/2024, 3:24 AM

If ScanWithTask is requesting for { "memory": 37407051360, "CPU": 1 } by default which causes my worker to OOM, how can I reduce this?

jay

09/23/2024, 3:59 AM

Is it OOMing or failing to schedule? I’m guessing it’s failing to schedule since you’re able to see the error message? The reason it’s requesting for that memory is because the file it’s reading is pretty large. You likely need bigger machines to process this dataset!

jay

09/23/2024, 4:00 AM

How large is the dataset you’re reading and how many files are there? I’m guessing you have a pretty large one in there (something like a 10G+ file?)

Kyle

09/23/2024, 4:08 AM

I'm working on a 30TB dataset and each file is indeed >10G but I'm not sure why some tasks succeed and some don't..

jay

09/23/2024, 4:09 AM

How large are your machines, and how many machines are in your cluster?

Kyle

09/23/2024, 4:18 AM

I have 70 machines which have 200gb memory each

jay

09/23/2024, 4:24 AM

Gotcha — yeah this is a really large workload. Perhaps we should set up some time to chat about it? There are some things that we need to be doing here that we aren’t doing yet to work well with these workloads: • Splitting of these 10G files into smaller partitions (we currently have a limit of 10 files for scan splitting because of metadata fetching overheads) • When performing shuffles (joins, sorts, groupbys, repartitions), there will be a really large Ray graph being constructed. We are in the process of designing a new shuffle system to handle these larger shuffles. LMK what timezone you’re operating out of? I’d love to get some of the Daft team involved and learn more about your workload to see if we can help stabilize it.

Kyle

09/23/2024, 4:25 AM

I'm based in GMT +8 😅

jay

09/23/2024, 4:26 AM

Hey, I’m from Singapore 😛 very familiar with that tz

jay

09/23/2024, 4:27 AM

If you’re comfortable with a 9AM morning call, we could definitely do that!

Kyle

09/23/2024, 4:27 AM

Great okay! I'm also okay with night calls

jay

09/23/2024, 4:31 AM

LMK if tomorrow @9AM is good for you? That would be our Monday@6PM PST / Your Tuesday@9AM

Kyle

09/23/2024, 4:50 AM

Okay sure!

jay

09/23/2024, 8:07 PM

(Sent you a calendar invite — chat soon!)

Kyle

09/23/2024, 10:52 PM

Cool! See you soon!

Open in Slack

Previous Next