<@U041QSEF2H2> PR to add anti/semi/left/right join...
# daft-dev
c
@Sammy Sidhu PR to add anti/semi/left/right joins to the native executor ready for review: https://github.com/Eventual-Inc/Daft/pull/2743
❤️ 2
One thing to note though, is that I tried this semi join implementation with @Kevin Wang’s new Q4. While both python and native see significant improvement, native is now lacking by about 40%. Attached a bunch of the traces/explains etc. below. I believe the issue stems from the native probe table build for the semi join takes longer and also allocates more memory. I'll be looking into why this is the case
👀 1
ok i figured it out, for python, semi join probe tables are:
HashMap<IndexHash, (), IdentityBuildHasher>
, i.e. we don't store indices. So, i thought for native, lets just not store the indices then, aka just keep the
HashMap<IndexHash, Vec<u64>, IdentityBuildHasher>
and just leave an empty vec in there and don't put anything. BUT, a ``HashMap<IndexHash, Vec<u64>`` will allocate more memory than ``HashMap<IndexHash, ()`` upon resizing, even if there's nothing in the vec. Changing it to
()
completely does the trick. Everything's g now. Native is on-par memory wise with Py, though speed is still roughly 5-10% slower, this is due to the dyn array comparator that native uses. I'll update the PR accordingly
s
Yeah that makes sense!
dropped a quick review! Took a bit of time to dig in fully 🙂
c
Yay thanks! Also, random question that I've been thinking about. Would it make sense to instantiate the hashmap with capacity = total length? This would then not incur any resizes, although it would likely overallocate (but I'm thinking since we're just storing pointers, this shouldn't be a significant memory cost?)