Hey all - qq, does Daft have join based filtering ...
# general
j
Hey all - qq, does Daft have join based filtering on read?
j
Are you referring the Dynamic Partitioning Pruning like Spark has?
j
When I was reading the apache data fusion paper, they made mention to an issue request that was submitted this year highlighting the usage of join based filtering on read, so I was wondering as you’ve done a lot of work on the io side whether it was something daft had
j
We do have it as part of an AQE rule, but AQE hasn’t yet been a feature that we’ve focused on yet! The work interestingly isn’t in the IO side — our IO already does a lot of good filter push down. The work is more about an engine knowing (using statistics or sampling) which side of a joint it wants to materialize first, and then using the materialized data to construct a filter to push down! @Sammy Sidhu might be able to add more here
j
What does AQE stand for here? Also that’s super interesting, would that show as part of the query plan as well I assume? I didn’t realise it wouldn’t be part of the IO, will see if I can read more up on it!
j
Adaptive Query Execution! It wouldn’t show up as part of the initial plan, but as the plan executes (and at various stages of execution) Daft gets more up-to-date information about the data coming out of each stage. For example if it runs the left side of a join, we can say oh it’s actually pretty small. Let me convert the data into a filter and push it down the right side. That’s DPP in a nutshell!