Jake Waller
05/07/2024, 9:49 AMnp.ndarray
)
2. PyArrow Arrays (pa.Array
)
3. Python lists (list
)
However the actual code (below) only seems to work for Series, lists and numpy arrays - not native pyarrow arrays (I found this out when attempting to return a pyarrow large string array) https://github.com/Eventual-Inc/Daft/blob/3e9dcd45945c75e3c8c3a30d721fc0d2727eb8b7/daft/udf.py#L108
# Post-processing of results into a Series of the appropriate dtype
if isinstance(result, Series):
return result.rename(name).cast(self.udf.return_dtype)._series
elif isinstance(result, list):
if self.udf.return_dtype == DataType.python():
return Series.from_pylist(result, name=name, pyobj="force")._series
else:
return Series.from_pylist(result, name=name, pyobj="allow").cast(self.udf.return_dtype)._series
elif _NUMPY_AVAILABLE and isinstance(result, np.ndarray):
return Series.from_numpy(result, name=name).cast(self.udf.return_dtype)._series
else:
raise NotImplementedError(f"Return type not supported for UDF: {type(result)}")
jay
05/07/2024, 4:18 PMjay
05/07/2024, 11:18 PMJake Waller
05/08/2024, 8:51 AM