If I want to get the first row (or any row, but no...
# general
k
If I want to get the first row (or any row, but not a mix of values from multiple rows) of every group how can I implement that? Alternatively, what's the best way to implement distinct/drop_duplicates based only on a subset of columns?
k
@Sammy Sidhu Empirically I tried that and it seems to grab the first value but I also noticed that in one of the docs it says that the function does not guarantee that it grabs from the same row across columns. Is this the right understanding?
j
@Kevin Wang maybe we should add an any_value wrapper that does a row-wise any_value across columns?
k
we could probably enforce that it grabs the same row. i think it already does but we just don’t promise that it will always do that
j
Yeah I was thinking we can maybe expose a
.groupby().any_row()
which would actually give that guarantee In the future I’d love to have also
.groupby().first(sort_group_by=…)
which seems like a pretty common ask