Chuanlei Ni
07/22/2024, 3:10 AMSammy Sidhu
07/22/2024, 8:09 PMreduce_pipeline
and reduce_and_fanout
require spread
strategies and a list[inputs]
, (one for each input partition) which gives the scheduler hints that they should spread the function invocations across the cluster since they are reduces. The default behavior is to schedule the function as close as possible to the data. But in the case of reduce we do not want that.
single_partition_pipeline
and fanout_pipeline
typically take in the number of args that the op requires (independent of the number of partitions) and these are scheduled where the data lives.
The reason why they may have the same impl but different names between these is to aid profiling. Ray's profiler captures functions at the name level!