are there any publicly available docs/diagrams on ...
# daft-dev
c
are there any publicly available docs/diagrams on daft architecture? I'm having a hard time following what actually happens on a
collect
polars and datafusion are both pretty similar, but it seems like daft is doing a some extra things that I'm having a hard time following. It seems to pass it off to a
context
object to materialize the results, but i'm not really able to follow what happens after the context object. There seems like there's a lot of back and forth between py & rust, making it hard to follow the logic. -- alternatively, if anyone has some free time to jump on a zoom/google meet and walk me through some parts, that would be extremely helpful!
something else that I'm having a hard time following is when is
Micropartition.read_parquet
called? Every path seems to go through the
GlobScan
and
Micropartition._from_scan_task
. Is this just old code that was never removed?
j
Architecture docs: https://www.getdaft.io/projects/docs/en/latest/faq/technical_architecture.html It’s slightly outdated, most notably the planning layer (Logical/Optimizer/Physical Plan) now happens all in Rust.
Micropartition.read_parquet
— this is actually I think mostly only called for testing when we were developing the MicroPartition abstraction > Every path seems to go through the
GlobScan
and
Micropartition._from_scan_task
. That is correct! Most of the actual reading in a Daft dataframe happens through the ScanTask abstraction
c
so could
Micropartition.read_parquet
be removed then?
j
Looks like we still use it in: 1. Testing (
tests/integrations/io/parquet/…
) 2. Our old Python-based read path (
daft.table.table_io.read_parquet
) — which I think only gets called when we try to call our read APIs using our old readers with the kwarg flag:
use_native_downloader=False
But yeah if those go away then it could be removed I think
(I can also hop on a call later at 3:30PM PST if you’d like someone to pair up with)
c
oh hey sorry I missed this. I could jump on a call tomorrow if you're free!
j
Yup! 10AM PST could work
c
I could do 11. 10 is my weekly meeting with lance.
j
Ah I’m booked at 11… how’s 3PM?
c
yeah I can do 3pm.
j
Sent a zoom invite!
🙌 1