@jay 30 workers, and the original data size is 2TB with about a billion rows.
Haven't printed the plan yet but there's filters, a dedup and an antijoin.
All print statements get through, including the final df count after the filters, dedup and the antijoin. It only crashes during the write and nothing can be found in the folder where it is writing to. It's a local path mounted such that all nodes are able to access (read and writes are working well for the same script for smaller test datasets of gb size).
The object memory store also rises to crazy levels like 200%, but for occasions like these sometimes it dies and sometimes it survives. Writing just the IDs of the final table consistently works but writing the full table does not. Raising the memory store % causes workers to OOM more frequently individually.
I didn't specify the number of partitions and number of parquet files so it should be default.
Apparently through the logs the driver ended the process.