Community for the Daft project and all things distributed data

Distributed Data Community

Realized that our `.url.download()` don’t schedule with a SPREAD strategy unlike our <https://github.com/Eventual-Inc/Daft/pull/1950/files#diff-f8f263d5eac62dc64c36d65bcd2ef6ea29d122cbe123a6ca6dcde8bc1f5dc0fcR659-R663|ScanTasks/ReduceTasks>

<@U041QSEF2H2> do you think we should corner-case projections with URL downloads to also run with SPREAD?

This would be useful for when we do something like:

```df = daft.from_glob_paths(...)
df = df.into_partitions(32)
df = df.with_column("data", df["path"].url.download())```


Yeah we should recognize it as a scan TBH 

So we can spread and also make sure we account for the memory inflation 

Yeah I’m running into OOMs now for the workload above, or really for any small-Parquet + URL download use-case because the Parquet files get coalesced too aggressively