One more question, can UDFs return multi-type list...
# daft-dev
m
One more question, can UDFs return multi-type lists such as ["String", 8] ? And if so, how can I define this correctly in the
return_dtype
of the function (I tried
Union[...]
but it does not work)
j
We don’t support multi-type returns in Daft at the moment unfortunately! You have a few options here: 1. You can use a Struct type, if you know that it’s always going to be the same number of elements, of the same type every time 2. You can use a Python type and just return a Python list… it’s not ideal but that’s always a possible fallback option!
m
Python type works indeed! Thank you.
j
Nice! The Python types are our escape hatch, but note that we aren’t able to do much intelligent optimizations on data that are just kept as Python objects (to Daft, the column is just an opaque list of some Python objects)
m
Too bad we can't store these object with a json.dumps and then cast them as lists. Such casting will create a 1-element list with the string of the whole thing instead of properly detecting each element in the list
j
Yes, although you might be interested in our functionality for json that will work over string columns: https://www.getdaft.io/projects/docs/en/latest/api_docs/doc_gen/expression_methods/daft.Expression.json.query.html
I would be hesitant to support a naive cast from string to list, from a from_json expression to convert string columns could be interesting to explore!
m
That's the problem. My json comes from a dumps but the cast does not recognize the values. Try this:
Copy code
ddf = daft.from_pandas(pd.DataFrame([[json.dumps(["42","45"])]], columns=["Test"]))
ddf.select(ddf["Test"].cast(daft.DataType.list(daft.DataType.string()))).collect()
Copy code
import pandas as pd
import daft
import json
When I run my json.query it gives me a stringified list of strings and I am having a hard time to convert it back unless maybe I run a UDF that does a json.loads...but it's seems overkilled
And the whole stringified json is because json.query needs the json to be a string if I am not mistaken ?
👍 1
j
Yes casting a string to list doesn’t work right now because it’s quite an ambiguous operation: for example in Python if you do
list(“[1, ‘x’]”)
you actually get a list of the characters rather than a list of values
m
If you'd support the json format natively then all my problems would go away :)
j
But I think it could be interesting to enable JSON-specific decoding, so a user can specify “I know this data is in JSON format, and I would like you to parse it into this specific Daft DataType”
🙏 1
m
Granted I could do a struct but my json object don't all have the same keys
j
Makes sense!
Let me bring it up to the team. We did have some ideas around supporting a JSON type
Would you mind making an issue for us? Some example use-cases could help us shape the feature better
m
Yes I can create this today
🎉 1
j
Thanks! Very exciting stuff :)