Peter
04/22/2024, 2:25 PMdaft.DataType.python()
as the return type.
Minimal repro:
import daft
df = daft.from_pylist(
[
{'a': 1},
{'a': 2},
{'a': 3},
{'a': 4},
{'a': 5},
]
)
@daft.udf(return_dtype=daft.DataType.python())
def my_udf(a: daft.Series) -> daft.Series:
values = [{'b': i**i} for i in a.to_pylist()]
return daft.Series.from_pylist(values)
df.with_column('b', my_udf(df['a'])).to_pandas()
Error I get:
PanicException: ('not implemented: Daft casting from Struct[b: Int64] to Python not implemented',)
The UDF works of course if I explicitly specify the UDF return type as Struct[b: Int64]
.
Interestingly, the UDF works when forcing my dict to have an 'imperfect' structure by alternating between int
and str
as the mapped type by changing the values
assignment to:
values = [{'b': i**i if i % 2 else str(i**i)} for i in a.to_pylist()]
I would have assumed that no errors get raised when specifying the return datatype of my UDF to be an arbitrary Python object.
Anything wrong with this assumption?Peter
04/22/2024, 2:26 PMgetdaft==0.2.21
jay
04/22/2024, 5:02 PMSeries.from_pylist([…], pyobj=“force”)
This will force it to create a Series of Python objects, without attempting to coerce to an arrow array first.
In fact, we should probably do this behavior for the user if we see that the return dtype is Python! Feel free to file a ticket or submit a PR, that should be pretty simple to add :)jay
04/22/2024, 5:31 PMpython()
, and we do pyobj="force"
already:
https://github.com/Eventual-Inc/Daft/blob/main/daft/udf.py#L111-L112
The issue here is that you are returning a Series object explicitly, which is already of struct type. So we have to cast it to a python type on return.
In your case, all you have to do is change the return statement instead to: return daft.Series.from_pylist(values, pyobj="force")
Peter
04/22/2024, 6:11 PM