Cory Grinstead
08/15/2024, 4:28 PMfrom
> select 1;
> select upper('hello');
I'd expect this to be the equivalent of selecting from an empty dataframe
> daft.from_pydict({}).select(daft.lit(1)).collect()
> daft.from_pydict({}).select(daft.lit('hello').str.upper()).collect()
however this applies the schema change, but does not add the data.
╭─────────╮
│ literal │
│ --- │
│ Utf8 │
╞═════════╡
╰─────────╯
(No data to display: Materialized dataframe has no rows)
is this expected, or a bug?jay
08/15/2024, 5:02 PMlit gets broadcasted across the existing dataframe…. So if you have an empty dataframe you get another empty dataframe in returnjay
08/15/2024, 5:02 PMselect 1; and select 1 from my_table in SQL might give the same distinction?jay
08/15/2024, 5:03 PMmy_table was empty)Cory Grinstead
08/15/2024, 5:11 PMselect 1 and select 1 from my_table will both return the same resultCory Grinstead
08/15/2024, 6:34 PMdf = daft.from_pydict({
'a': [1,2,3],
'b': ['a', 'b', 'c']
}).select(daft.lit(1)).collect()
I'd expect this to return a single record, but it instead broadcasts it across n rows and gives me
literal: [1,1,1]jay
08/15/2024, 6:39 PMjay
08/15/2024, 6:42 PM.eval_expression(lit(1)) on every partition in the distributed dataframe
This can either:
• [CURRENT BEHAVIOR] Broadcast lit(1) to the length of every partition (hence performing a global broadcast)
• Return a new partition with just 1 value per-partition
The second option would unfortunately mean that our new dataframe would have N_PARTITION number of rows, which is quite weird
Alternatively we can detect any projections with only literals and make special-case those to produce only 1 partition with 1 value.Cory Grinstead
08/15/2024, 7:04 PMCory Grinstead
08/15/2024, 7:07 PMselect 1 and select 1 from my_tableDesmond Cheong
08/15/2024, 8:18 PMselect 1 from my_table produces a 1 for every row in the tableDesmond Cheong
08/15/2024, 8:19 PMSammy Sidhu
08/15/2024, 8:29 PMCory Grinstead
08/15/2024, 8:32 PMCory Grinstead
08/15/2024, 8:38 PMselect <literal> but not select <literal> from <tbl>