avril
04/10/2024, 10:00 AMlimit() vs show()
just to double-check my understanding š
As a Dask user I'm used to calling df.head(n)
to see the first n rows and having that return a pandas df (eager) that I can further work with, usually for interactive testing/demo in a notebook. In Daft, df.show()
is eager but returns a NoneType
whereas df.limit()
also returns n rows but is lazy. Is the thinking here to discourage using df.show()
as part of pipelines/transformations and instead recommend using limit
with an explicit collect
in cases where we want to only materialize n rows
of a dataframe and manipulate that sample further?jay
04/10/2024, 4:13 PM.limit(100000)
avril
04/10/2024, 4:17 PM