Hey team, I ran into a strange bug. It seems like...
# daft-dev
p
Hey team, I ran into a strange bug. It seems like I can't return a Python dictionary from a UDF when specifying
daft.DataType.python()
as the return type. Minimal repro:
Copy code
import daft

df = daft.from_pylist(
    [
        {'a': 1},
        {'a': 2},
        {'a': 3},
        {'a': 4},
        {'a': 5},
    ]
)


@daft.udf(return_dtype=daft.DataType.python())
def my_udf(a: daft.Series) -> daft.Series:
    values = [{'b': i**i} for i in a.to_pylist()]
    return daft.Series.from_pylist(values)


df.with_column('b', my_udf(df['a'])).to_pandas()
Error I get:
Copy code
PanicException: ('not implemented: Daft casting from Struct[b: Int64] to Python not implemented',)
The UDF works of course if I explicitly specify the UDF return type as
Struct[b: Int64]
. Interestingly, the UDF works when forcing my dict to have an 'imperfect' structure by alternating between
int
and
str
as the mapped type by changing the
values
assignment to:
Copy code
values = [{'b': i**i if i % 2 else str(i**i)} for i in a.to_pylist()]
I would have assumed that no errors get raised when specifying the return datatype of my UDF to be an arbitrary Python object. Anything wrong with this assumption?
I am on
getdaft==0.2.21
j
Ah this is because when you return a list of Python dicts, it first gets read into a PyArrow array (which gets coerced into structs) and then attempts a cast, which we haven’t implemented. To work around this, you can instead return a Daft series explicitly:
Series.from_pylist([…], pyobj=“force”)
This will force it to create a Series of Python objects, without attempting to coerce to an arrow array first. In fact, we should probably do this behavior for the user if we see that the return dtype is Python! Feel free to file a ticket or submit a PR, that should be pretty simple to add :)
Ahh I see the issue! We actually do already have logic to check if the return dtype is
python()
, and we do
pyobj="force"
already: https://github.com/Eventual-Inc/Daft/blob/main/daft/udf.py#L111-L112 The issue here is that you are returning a Series object explicitly, which is already of struct type. So we have to cast it to a python type on return. In your case, all you have to do is change the return statement instead to:
return daft.Series.from_pylist(values, pyobj="force")
p
Indeed, that did the trick. Thanks so much for the quick help, @jay!