Is Iceberg table predicate pushdown applicable to ...
# daft-dev
a
Is Iceberg table predicate pushdown applicable to is_in expression? More specifically, my use case needs equivalent of pandas_df[pandas_df.column_a.isin(python_list) ] on an iceberg table bucketed by column_a
j
Hmm interesting. I’m actually not sure 😛 @Sammy Sidhu any idea if this is the case?
I think if you do this it should work though:
Copy code
filter_expr = functools.reduce(lambda x, y: x | y, [col("a") == i for i in python_list])
Essentially decomposing your
is_in
into a series of OR equality matches
s
We currently don't have that as a rule but it should be straightforward to add here! Would you be interested in making a PR to add this?
a
yes, thanks for the work around. Is there a way to support pyiceberg's in clause in daft as well? My work-around is to leverage pyiceberg for "in" clause use cases
Copy code
inter_table = catalog.load_table("mydb.mytable_iceberg")
    inter_df = daft.from_pandas(inter_table.scan(
       row_filter= f" lookup_id in ( {lookup_list_string} ) " 
    )
j
Is the above example different from
df.where(df["customer_id"].is_in(customer_list_string))
?
Or, if you’re referring to predicate pushdown then see Sammy’s suggestion 😄 https://dist-data.slack.com/archives/C052CA6Q9N1/p1720488699210739?thread_ts=1720488320.232289&cid=C052CA6Q9N1
a
it is another example