Realized that our `.url.download()` don’t schedule...
# daft-dev
j
Realized that our
.url.download()
don’t schedule with a SPREAD strategy unlike our ScanTasks/ReduceTasks @Sammy Sidhu do you think we should corner-case projections with URL downloads to also run with SPREAD?
This would be useful for when we do something like:
Copy code
df = daft.from_glob_paths(...)
df = df.into_partitions(32)
df = df.with_column("data", df["path"].url.download())
s
Yeah we should recognize it as a scan TBH
So we can spread and also make sure we account for the memory inflation
👏 1
j
Yeah I’m running into OOMs now for the workload above, or really for any small-Parquet + URL download use-case because the Parquet files get coalesced too aggressively