Can a UDF return a DataFrame directly? For example...
# general
k
Can a UDF return a DataFrame directly? For example I wanted to apply a UDF on a group, i.e. groupedDF.map_groups(udf(df)). This would allow me to easily keep all the columns as I want to save the result later on without another join and also help to distribute the load into known groups.
j
Currently we only support returning one column in UDFs Note though that this does let you return Struct columns, which are in effect a dataframe! This means that you can then just select on that one column and “splat” it:
df = df.select(MyUDF(…).alias(“structcol”))
Then:
df = df.select(“structcol.*”)
Does this solve your issue here?
k
Yup! I'll try to collect all the columns into one first then splat them back out later!
j
Yeah lmk also if there’s maybe a better API for this. Or maybe we just need better docs