Ismael Ghalimi
03/12/2024, 2:10 PM[DuckDB|Polars]
• I want it on top of Ray
• I want it with support for rich datatypes (geometries, graphs, series)
• I want it with support for both CPU and GPU
• I want it with support for Iceberg and Delta Lake
In other words, users won't come to Daft for Daft itself, because they don't really know what Daft is yet. They'll come to Daft because they want a distributed version of [DuckDB|Polars]
.
What's critical in that story is that it's not a distributed version of DuckDB
or a distributed version of Polars
. One could build one or the other, but it would be ten times less valuable than a distributed version of [DuckDB|Polars]
. The reason for it is simple: in a team, you'll find people who want SQL tables, and others who want DataFrames. You don't want to ask them to chose. You want to offer them both.
And this is why Ibis will be so critical to Daft's success in my opinion.
And this is why having the native engine that you have instead of being entirely based on Polars is so critical. It gives you the right level of abstraction at the bottom.
Now, this is a big list:
• Distributed version of [DuckDB|Polars]
• Powered by Ray
• Rich datatypes (geometries, graphs, series)
• CPU × GPU
• Iceberg + Delta Lake
And that means that you need to be really smart about how you use your limited engineering resources.
Most importantly, the window of opportunity is now. If you don't build this, someone else will for sure. Nature abhors a vacuum.