Also, I discovered that it is not possible to sort...
# general
a
Also, I discovered that it is not possible to sort on a boolean column or expression, contrarily to most DB. Is there a reason why not? If that's because "which one should come first, true or false?", is that really a problem? 😄
Copy code
df = daft.from_pydict({'my_bool': [True, False, True]})
df.sort('my_bool').collect()
Copy code
DaftCoreException: DaftError::External Unable to create logical plan node.
Due to: DaftError::ValueError Cannot sort on expression col(my_bool) with type: Boolean
My use case here is that I want to create generate stats on all columns of a dataset, so I need to sort all columns, and that crashes when I encounter a boolean one. For now I do:
Copy code
sorting = c.if_else(0, 1) if dtype == DataType.bool() else c
Thinking about it, I'm losing null values here, I should prefer:
Copy code
c.is_null(lit(2), c.if_else(lit(0), lit(1)))
c
That's a nice workaround! Let me check and get back to you on why we currently don't support boolean sorts
Looks like we just didn't fully implement boolean sorting, I have a fix for it here: https://github.com/Eventual-Inc/Daft/pull/2529
a
Awesome, thanks a lot, that will make the code cleaner!