Hello Team, I'm currently looking into migrating m...
# daft-dev
r
Hello Team, I'm currently looking into migrating my PySpark Job into Daft. I'm evaluating if it fits my use case. I've gone though the docs on the website but didn't find any detailed resource on Daft Query Optimizer or Join algorithms. Is there any detailed research paper on Daft? or any other publication on internal details? I've also gone through benchmark results and they look really amazing but I didn't find why Daft is faster than Spark, is there any trade off? Would appreciate if there are any resource on it. Thanks in advance
👋 1
j
Hi @Rushikesh Padia! We don’t currently have any resources on our query optimizer or join algorithms. Happy to answer any questions you may have here though! • The types of joins we support are listed under the
strategy
keyword arg: https://www.getdaft.io/projects/docs/en/latest/api_docs/doc_gen/dataframe_methods/daft.DataFrame.join.html Daft’s speedups over Spark come from: • Vectorized execution • Much faster/optimized reads from cloud storage (specifically AWS S3), written in async Rust • Lower overhead wrt JVM
WRT trade-offs, you’ll find that Spark will have more functionality (e.g. support SQL and other functions), but we’re constantly adding to Daft’s suite of functions 😄
r
Thanks Jay! 😄