Cory Grinstead
05/30/2024, 7:01 PMcollect
polars and datafusion are both pretty similar, but it seems like daft is doing a some extra things that I'm having a hard time following. It seems to pass it off to a context
object to materialize the results, but i'm not really able to follow what happens after the context object. There seems like there's a lot of back and forth between py & rust, making it hard to follow the logic.
--
alternatively, if anyone has some free time to jump on a zoom/google meet and walk me through some parts, that would be extremely helpful!Cory Grinstead
05/30/2024, 7:51 PMMicropartition.read_parquet
called? Every path seems to go through the GlobScan
and Micropartition._from_scan_task
.
Is this just old code that was never removed?jay
05/30/2024, 8:09 PMjay
05/30/2024, 8:11 PMMicropartition.read_parquet
— this is actually I think mostly only called for testing when we were developing the MicroPartition abstraction
> Every path seems to go through the GlobScan
and Micropartition._from_scan_task
.
That is correct! Most of the actual reading in a Daft dataframe happens through the ScanTask abstractionCory Grinstead
05/30/2024, 8:12 PMMicropartition.read_parquet
be removed then?jay
05/30/2024, 8:15 PMtests/integrations/io/parquet/…
)
2. Our old Python-based read path (daft.table.table_io.read_parquet
) — which I think only gets called when we try to call our read APIs using our old readers with the kwarg flag: use_native_downloader=False
jay
05/30/2024, 8:16 PMjay
05/30/2024, 9:30 PMCory Grinstead
05/30/2024, 11:55 PMjay
05/31/2024, 4:50 AMCory Grinstead
05/31/2024, 2:23 PMjay
05/31/2024, 4:10 PMCory Grinstead
05/31/2024, 4:15 PMjay
05/31/2024, 6:29 PM