jay
04/26/2024, 8:25 PM.datetime_range
in Daft (seems like it converts 2 timestamp
columns into a list[timestamp]
column according to the provided interval)
However, this is actually pretty easy to do via a UDF as an escape hatch:
import daft
import polars as pl
import datetime
@daft.udf(return_dtype=daft.DataType.list(daft.DataType.timestamp("us")))
def my_datetime_range(start: daft.Series, end: daft.Series, interval: str) -> pl.Series:
polars_df = pl.DataFrame({"start": pl.Series(start.to_arrow()), "end": pl.Series(end.to_arrow())})
polars_results = polars_df.select(range=pl.datetime_ranges(start="start", end="end", interval=interval))
return daft.Series.from_arrow(polars_results["range"].to_arrow())
df = daft.from_pydict({"start": [datetime.date(2023, 1, 1), datetime.date(2023, 1, 2)], "end": [datetime.date(2023, 1, 5),
datetime.date(2023, 1, 7)]})
df = df.with_column("range", my_datetime_range(df["start"], df["end"], "1d"))
df.show()
We can make this a little cleaner too by supporting easier conversions between Daft Series
and Polars Series
. Let me know if that’s something you might want!jay
04/26/2024, 8:32 PMstart
must only have one value for some reason?Joshua Pedrick
04/26/2024, 8:45 PMJoshua Pedrick
04/26/2024, 8:46 PM[ t for t in range(start_time, end_time, interval) ]
jay
04/26/2024, 8:46 PMjay
04/26/2024, 9:00 PMpl.datetime_ranges
and pl.datetime_range
, where the former works on a dataframe (and expressions) but the latter only works on Series (and scalar values).
Daft’s primary abstraction is the DataFrame, and so my example uses pl.datetime_ranges
.
Does this answer your question? 😮Jake Waller
05/07/2024, 9:50 AM