hey, i find it's a little confusing that single_pa...
# daft-dev
c
hey, i find it's a little confusing that single_partition_pipeline vs fanout_pipeline and reduce_pipeline vs reduce_and_fanout are same implemented. why are we have different function names? waiting for the reply
s
Hi! @Chuanlei Ni There's a couple of factors!
reduce_pipeline
and
reduce_and_fanout
require
spread
strategies and a
list[inputs]
, (one for each input partition) which gives the scheduler hints that they should spread the function invocations across the cluster since they are reduces. The default behavior is to schedule the function as close as possible to the data. But in the case of reduce we do not want that.
single_partition_pipeline
and
fanout_pipeline
typically take in the number of args that the op requires (independent of the number of partitions) and these are scheduled where the data lives. The reason why they may have the same impl but different names between these is to aid profiling. Ray's profiler captures functions at the name level!