Chuanlei Ni
07/25/2024, 1:31 AMdef to_partition_tasks(self, psets: dict[str, list[PartitionT]]) -> physical_plan.MaterializedPhysicalPlan:
return physical_plan.materialize(self._scheduler.to_partition_tasks(psets))
could you please walk through this code snippet? i stuck on this logic.... help me @Colin Ho @jayjay
07/25/2024, 1:54 AMInProgressPhysicalPlan
2. MaterializedPhysicalPlan
InProgressPhysicalPlan
is a generator that can emit:
• `None`: indicating that the plan is waiting on more work to be done before it can proceed
• `PartitionTaskBuilder`: indicating that this is a pipeline of instructions that can be further appended to (allowing us to do fusing of operations such as Project -> Filter -> …)
• `PartitionTask`: indicating that this is a task that has been “built”, and that the runner should run this task.
MaterializedPhysicalPlan
is very similar, but only emits:
• None
• PartitionTask
• MaterializedResult
: indicates that this is a final result that is completed, and the runner should store this in the final set of partitions
Essentially the materialize(plan: InProgressPhysicalPlan) -> MaterializedPhysicalPlan
function will just finalize any PartitionTaskBuilders
that were emitted, and emit those as PartitionTasks
. This way the runner can just run any PartitionTasks
that are emitted from this plan.Chuanlei Ni
07/25/2024, 1:56 AMmaterialize
can also generate MaterializedResult[PartitionT]
?jay
07/25/2024, 1:56 AMMaterializedPhysicalPlan
that it can now call next()
on.
It will receive either a
• None
(this is the plan’s way of saying — “please do more work so I can proceed”)
• PartitionTask
(this is the plan’s way of saying — “please run this task”)
• Or a StopIteration
(this is the plan’s way o saying — no more work left)
• MaterializedResult
: indicates that this is a final result that is completed, and the runner should store this in the final set of partitionsjay
07/25/2024, 1:57 AMMaterializedResult[PartitionT]
is the plan’s way of saying “this is a FINAL result”jay
07/25/2024, 1:57 AMjay
07/25/2024, 2:00 AMPartitionTasks
from the left/right children”, keep a buffer of the left/right tasks, and then start emitting new PartitionTaskBuilders
to kickstart the downstream generators as results become availableChuanlei Ni
07/25/2024, 2:12 AMChuanlei Ni
07/25/2024, 2:17 AMNone
(this is the plan’s way of saying — “please do more work so I can proceed”). how to do more work to callback the process of generating partition tasks?Chuanlei Ni
07/25/2024, 2:17 AMjay
07/25/2024, 2:17 AMPartitionTasks
that it received)Chuanlei Ni
07/25/2024, 1:38 PMjay
07/25/2024, 4:11 PMChuanlei Ni
07/26/2024, 1:12 AM