Chuanlei Ni
07/25/2024, 1:31 AMdef to_partition_tasks(self, psets: dict[str, list[PartitionT]]) -> physical_plan.MaterializedPhysicalPlan:
return physical_plan.materialize(self._scheduler.to_partition_tasks(psets))
could you please walk through this code snippet? i stuck on this logic.... help me @Colin Ho @jayjay
07/25/2024, 1:54 AMInProgressPhysicalPlan
2. MaterializedPhysicalPlan
InProgressPhysicalPlan is a generator that can emit:
• `None`: indicating that the plan is waiting on more work to be done before it can proceed
• `PartitionTaskBuilder`: indicating that this is a pipeline of instructions that can be further appended to (allowing us to do fusing of operations such as Project -> Filter -> …)
• `PartitionTask`: indicating that this is a task that has been “built”, and that the runner should run this task.
MaterializedPhysicalPlan is very similar, but only emits:
• None
• PartitionTask
• MaterializedResult : indicates that this is a final result that is completed, and the runner should store this in the final set of partitions
Essentially the materialize(plan: InProgressPhysicalPlan) -> MaterializedPhysicalPlan function will just finalize any PartitionTaskBuilders that were emitted, and emit those as PartitionTasks. This way the runner can just run any PartitionTasks that are emitted from this plan.Chuanlei Ni
07/25/2024, 1:56 AMmaterialize can also generate MaterializedResult[PartitionT] ?jay
07/25/2024, 1:56 AMMaterializedPhysicalPlan that it can now call next() on.
It will receive either a
• None (this is the plan’s way of saying — “please do more work so I can proceed”)
• PartitionTask (this is the plan’s way of saying — “please run this task”)
• Or a StopIteration (this is the plan’s way o saying — no more work left)
• MaterializedResult : indicates that this is a final result that is completed, and the runner should store this in the final set of partitionsjay
07/25/2024, 1:57 AMMaterializedResult[PartitionT] is the plan’s way of saying “this is a FINAL result”jay
07/25/2024, 1:57 AMjay
07/25/2024, 2:00 AMPartitionTasks from the left/right children”, keep a buffer of the left/right tasks, and then start emitting new PartitionTaskBuilders to kickstart the downstream generators as results become availableChuanlei Ni
07/25/2024, 2:12 AMChuanlei Ni
07/25/2024, 2:17 AMNone (this is the plan’s way of saying — “please do more work so I can proceed”). how to do more work to callback the process of generating partition tasks?Chuanlei Ni
07/25/2024, 2:17 AMjay
07/25/2024, 2:17 AMPartitionTasks that it received)Chuanlei Ni
07/25/2024, 1:38 PMjay
07/25/2024, 4:11 PMChuanlei Ni
07/26/2024, 1:12 AM