avril
04/10/2024, 10:00 AMlimit() vs show() just to double-check my understanding š
As a Dask user I'm used to calling df.head(n) to see the first n rows and having that return a pandas df (eager) that I can further work with, usually for interactive testing/demo in a notebook. In Daft, df.show() is eager but returns a NoneType whereas df.limit() also returns n rows but is lazy. Is the thinking here to discourage using df.show() as part of pipelines/transformations and instead recommend using limit with an explicit collect in cases where we want to only materialize n rows of a dataframe and manipulate that sample further?jay
04/10/2024, 4:13 PM.limit(100000)avril
04/10/2024, 4:17 PM