Hello everyone! Thanks for making that library pos...
# general
a
Hello everyone! Thanks for making that library possible, I love it! I'm running into a case where I need to know if a column is temporal (date, time or timestamp), and I'm struggling to test the column datatype. Indeed, I discovered this morning that the timezone of a Timestamp is part of the datatype itself, making my naive function
is_datatype_temporal
impossible to implement by comparing datatypes directly. I did not found any simple way to get the "timestamp" information from the datatype. How would you implement such function? Thank you!
❤️ 1
c
We do currently have a few private functions to identify the 'type' of a datatype, you can find them here: https://github.com/Eventual-Inc/Daft/blob/main/daft/datatype.py#L501. Example usage:
daft.DataType.date()._is_temporal_type()
although do note that the
time
type is currently not categorized as a 'temporal' type, so you may have to add some custom code. (I'll look into enabling it)
j
I wonder if we should expose these as public methods on
DataType
a
I would love to! At least a simple helper to get the "raw" datatype, without the timeunit or timezone details. Same for Decimal, it's not possible to check if we have a decimal datatype because to construct this dtype instance we need to pass two integers, which makes for a ton of possible combinations.
j
yeah interesting… What API do you think would be useful here?
a
When I think about it, in DuckDB, dtypes are exposed as strings, so we can parse them to extract any kind of information, details and base types. Maybe I can do just the same. Converting a daft DataType to a string returns
'Timestamp(Milliseconds, None)'
, so I could theoretically just test for the
Timestamp(
prefix in this case, and do the same for other dtypes.
It's not perfectly clean but it works, as long as the string representation stays consistent among versions, but I can unit test that to be sure.
However, having some official helpers can be nice to extract the different components from the datatype. For instance: •
dtype.base
would return string
'Timestamp'
, and be present in all DataType instances •
dtype.unit
would return an instance of
<http://TimeUnit.ms|TimeUnit.ms>()
, and only be present in Timestamp and Time dtypes •
dtype.timezone
would do the same for timezones, in Timestamp dtypes only, etc etc.
For that
.base
property, I cannot see anything else than a string as return, as their is no class for the "base" dtypes, and creating some would be maybe overkill. For the per-datatype properties, they would be very interesting to be able to access those details, since once the dtype is built, their is no (public) way to access this information anymore.