Hey, so I see that we have a `count` method with s...
# daft-dev
r
Hey, so I see that we have a
count
method with some existing `CountMode`s. For
count_distinct
, would it be better if instead of a new expression, I just created a new
CountMode
? I.e., something like
CountMode::Distinct
? The current `CountMode`s are
CountMode::All
,
CountMode::Valid
, and
CountMode::Null
.
k
Hm good question. If we were following SQL semantics (
COUNT(DISTINCT col)
) it would make sense to add it to
count
, but if we were to be similar to pyspark, they have a specific
countDistinct
function. My opinion is we should have a separate function since counting distinct is functionally pretty different from the other count modes, but I could go both ways.
r
That's fair. So the "counting" APIs would look something similar to this then:
count(count_mode = ...)
count_distinct()
count_distinct_approx()
👍 1
k
We should probably also decide if the approx goes before or after, since we already have
approx_percentiles
. We could rename that expression and set the original name as an alias with a deprecation warning if we wanted to change it
j
I think we should do after, easier to find 😬
d
Fwiw,
approx_count_distinct
is the more conventional name
r