This message was deleted.
# daft-dev
s
This message was deleted.
🔥 1
k
I am quite fond of Ibis a lot and not an expert 😁, but at this point it doesn't make sense to me to use Ibis as a backend which in turn uses other backends like Polars, DuckDB to do the job?! And you mention that Polars and DuckDB should be added as dependencies...🤔
i
What I am saying is that Ibis should be your primary API, and that DuckDB and Polars should be added as optional dependencies so that you inherit all their scalar functions overnight. Re-implementing them all natively in Daft will take a long time and a lot of efforts. DuckDB is a tiny dependency that Daft can easily afford. We've already done an integration with Daft, in a totally zero-copy manner, thanks to Arrow. As a result, we get all DuckDB scalar functions and all its extensions (the geospatial extension is amazing), for free, right now. Furthermore, it gives us access to the DuckDB file format, which supports incremental writes (something that Parquet cannot do and will never do).
k
For Polars and DuckDB it makes sense if you want to fast-forward and get some functions. But for Ibis I still can't see the real benefit. I just want to understand this out of curiosity 😉. Do you keep the existing Rust goodies like the execution layer with the high-performance tabular abstraction? Maybe @jay, @Sammy Sidhu and others can help. According to Ibis docs you get: Ibis for library developers Python developers creating libraries can use Ibis to: • instantly support 20+ data backends • instantly support pandas, PyArrow, and Polars objects • read and write from all common file formats (depending on the backend) • trace column-level lineage through Ibis expressions • compile Ibis expressions to SQL or Substrait • perform cross-dialect SQL transpilation (powered by SQLGlot) Maybe adopting Substrait in the future when more engines ig/will support it could make sense?!
j
We actually talked to some of the folks who work on Ibis earlier this week! The context there is we were trying to scope out how much work it’d be + what benefits would Daft reap from building out a Daft backend for Ibis. I think when @Ismael Ghalimi is suggesting to use Ibis as a backend for Daft, it’s pretty localized to operations that would happen per-row/per-partition. Daft will still perform the higher level distributed dataframe operations, but for example we don’t yet have a string-left-pad expression natively implemented and if an Ibis user wanted to use the Daft backend, it could be pretty simple for Daft to route the Ibis string-left-pad operation to a Ibis-fronted DuckDB string-left-pad operation when we execute work on each row. This is interesting for us particularly because we could perhaps gain a lot of (largely scalar operation) functionality from the onset by leveraging other libraries for some of these operations (and incurring a memory copy to/from these libraries), and have pretty good support for the Ibis API.
👍 1
i
This is an option as well, but what we were thinking about is a bit different. For scalar functions, we really want a zero-copy integration. This could be done with or without Ibis, but doing it with Ibis would provide a cleaner API. That being said, our main requirement regarding Ibis is to use Daft as a back-end for Ibis so that we can build applications on top of Daft that could also be deployed on top of any Ibis back-end. The two projects are totally decoupled though, and the former is a lot less work that the latter.
👍 2
We believe that Ibis will become a major force in the industry, and a lot of applications will be built on top of Ibis. This creates a fantastic opportunity for Daft to become the best distributed back-end for it. And one should keep in mind something super important: the main benefit of Ibis is not that it supports 20 back-ends (21 if we add Daft), it's that it provides an abstraction for SQL and DataFrames. This, in our opinion, is a really, really big deal.
👍 1
j
Submitted an issue in the Ibis repo: https://github.com/ibis-project/ibis/issues/8904
👍 2
k
Thanks, starting to make sense to use Daft as a backend for Ibis 😁
👍 1