I m currently running a daft job on ray and it seems that th Distributed Data Community #general

I'm currently running a daft job on ray, and it se...

Kyle

09/09/2024, 5:12 AM

I'm currently running a daft job on ray, and it seems that the job largely finished after an hour. However, there's still a few stragglers that continue to be running the final writing step and after one extra hour the results folder is still the same size and has the same number of parquets. Any idea why that may be the case?

jay

09/09/2024, 5:41 AM

Hi! Can you share how you know there are some stragglers?

Kyle

09/09/2024, 5:46 AM

There's 200+ hashjoin-writefile tasks which were created and there's 8 which have been just running for the past 2 hours with nothing new in logs or in the results folder

Kyle

09/09/2024, 5:47 AM

The rest of the maps and reduces and aggregates etc all completed within one hour out of the 3 hours it's been for the job

Kyle

09/09/2024, 5:49 AM

The writing task is supposed to just write out the ids of the rows which are kept after the joins and filtering so I wouldn't expect it to be a big table

Kyle

09/09/2024, 5:51 AM

Within the results folder most of the parquets have been well formed, except for three which are still suffixed by a uuid and are either empty or half the size of the rest of the parquets

jay

09/09/2024, 6:45 AM

Could you share your plan? You can get it by running df.explain(show_all=True) I’ll try to reproduce it. My guess is that the writer (we currently use PyArrow) is being unreliable. What version of PyArrow are you using?

Kyle

09/09/2024, 12:57 PM

In between the df gets materialized so I didn't try to run the explain on the final df.. The manual fix I applied was to kill the worker stragglers and eventually the job got finished

jay

09/09/2024, 5:22 PM

Hmm interesting. When you say stragglers do you mean Ray tasks that you’re looking at on the dashboard?

Kyle

09/09/2024, 11:49 PM

Yes i mean ray tasks which are taking way longer than they should with no errors and don't complete

Open in Slack

Previous Next