PR for huggingface object store <https://github.co...
# daft-dev
c
PR for huggingface object store https://github.com/Eventual-Inc/Daft/pull/2701 I wasn't sure of the best way to integration test this yet, so we'll still want to add those, but otherwise I believe it's ready for review
anyone know why im seeing this error in my CI pipeline? https://github.com/Eventual-Inc/Daft/actions/runs/10478991314/job/29023615204?pr=2701#step:6:788 I didnt go anywhere near that file.
k
merge main in again
it was an error with a previous PR that should be fixed now
Actually it hasn't been merged yet https://github.com/Eventual-Inc/Daft/pull/2700
c
Right now this uses the
HTTPConfig
and is essentially a wrapper over the HTTP object store, However, I'm wondering if we would want to eventually separate this into it's own
HFConfig
so we could support things like huggingface's
split
parameter
Copy code
read_parquet(
  'hf://..', 
  config=HFConfig(split='train')
)
just made a pretty big change to this pr so we can support reading much more datasets, (anything that supports the hf parquet api).
hoping to get a review on this soon. I want to test this out with conor's parquet performance fix as it currently seems like there's a partial materialization happening when performing
explain(show_all=True)
It took nearly 10 minutes (albeit in non-release mode) to produce this physical plan
k
gave it a review!
🙌 1