Hi <@U070GDX2BQX>! We don’t yet have a native imp...
# general
j
Hi @Joshua Pedrick! We don’t yet have a native implementation of
.datetime_range
in Daft (seems like it converts 2
timestamp
columns into a
list[timestamp]
column according to the provided interval) However, this is actually pretty easy to do via a UDF as an escape hatch:
Copy code
import daft
import polars as pl
import datetime

@daft.udf(return_dtype=daft.DataType.list(daft.DataType.timestamp("us")))
def my_datetime_range(start: daft.Series, end: daft.Series, interval: str) -> pl.Series:
    polars_df = pl.DataFrame({"start": pl.Series(start.to_arrow()), "end": pl.Series(end.to_arrow())})
    polars_results = polars_df.select(range=pl.datetime_ranges(start="start", end="end", interval=interval))
    return daft.Series.from_arrow(polars_results["range"].to_arrow())

df = daft.from_pydict({"start": [datetime.date(2023, 1, 1), datetime.date(2023, 1, 2)], "end": [datetime.date(2023, 1, 5),
datetime.date(2023, 1, 7)]})
df = df.with_column("range", my_datetime_range(df["start"], df["end"], "1d"))

df.show()
We can make this a little cleaner too by supporting easier conversions between Daft
Series
and Polars
Series
. Let me know if that’s something you might want!
Actually this might be slightly wrong, I’m a little confused by Polars’ API. Looks like
start
must only have one value for some reason?
j
the polars function generates a series from given inputs
[ t for t in range(start_time, end_time, interval) ]
j
Ahh got it 🙂
Ok, modified the example — it seems there’s a difference between
pl.datetime_ranges
and
pl.datetime_range
, where the former works on a dataframe (and expressions) but the latter only works on Series (and scalar values). Daft’s primary abstraction is the DataFrame, and so my example uses
pl.datetime_ranges
. Does this answer your question? 😮
j
Greater support for Polars conversion would be great just so I don't have to use my own helper functions!