Hey team - loving Daft so far. I noticed a discrep...
# general
j
Hey team - loving Daft so far. I noticed a discrepencey between the docs and actual API implementation however. https://www.getdaft.io/projects/docs/en/latest/user_guide/daft_in_depth/udf.html Here in the UDF guide you state that the returned value can be one of the following types 1. Numpy Arrays (
np.ndarray
) 2. PyArrow Arrays (
pa.Array
) 3. Python lists (
list
) However the actual code (below) only seems to work for Series, lists and numpy arrays - not native pyarrow arrays (I found this out when attempting to return a pyarrow large string array) https://github.com/Eventual-Inc/Daft/blob/3e9dcd45945c75e3c8c3a30d721fc0d2727eb8b7/daft/udf.py#L108
Copy code
# Post-processing of results into a Series of the appropriate dtype
        if isinstance(result, Series):
            return result.rename(name).cast(self.udf.return_dtype)._series
        elif isinstance(result, list):
            if self.udf.return_dtype == DataType.python():
                return Series.from_pylist(result, name=name, pyobj="force")._series
            else:
                return Series.from_pylist(result, name=name, pyobj="allow").cast(self.udf.return_dtype)._series
        elif _NUMPY_AVAILABLE and isinstance(result, np.ndarray):
            return Series.from_numpy(result, name=name).cast(self.udf.return_dtype)._series
        else:
            raise NotImplementedError(f"Return type not supported for UDF: {type(result)}")
πŸ”₯ 1
j
Ah good catch! Could you create an issue? It’s also probably a pretty easy fix if you wanna take a stab at a first contribution!
We were about to cut a release, so I made a quick little fix πŸ˜› https://github.com/Eventual-Inc/Daft/pull/2252 The new release should be cut by EOD!
j
Thanks Jay! Next time i'll cut a ticket and implement myself πŸ™‚
πŸ™Œ 1