Adding some context to this <https app slack com client T041 Distributed Data Community #daft-dev

Adding some context to this: <https://app.slack.co...

Ismael Ghalimi

03/12/2024, 2:10 PM

Adding some context to this: https://app.slack.com/client/T041ND9T998/C052CA6Q9N1 In order to better understand what we are proposing, you could ask yourself the following question: Why should I use Daft? For most new users, I am willing to bet that the answer will be along these lines: • I want a distributed version of

[DuckDB|Polars]

• I want it on top of Ray • I want it with support for rich datatypes (geometries, graphs, series) • I want it with support for both CPU and GPU • I want it with support for Iceberg and Delta Lake In other words, users won't come to Daft for Daft itself, because they don't really know what Daft is yet. They'll come to Daft because they want a distributed version of

[DuckDB|Polars]

. What's critical in that story is that it's not a distributed version of

DuckDB

or a distributed version of

Polars

. One could build one or the other, but it would be ten times less valuable than a distributed version of

[DuckDB|Polars]

. The reason for it is simple: in a team, you'll find people who want SQL tables, and others who want DataFrames. You don't want to ask them to chose. You want to offer them both. And this is why Ibis will be so critical to Daft's success in my opinion. And this is why having the native engine that you have instead of being entirely based on Polars is so critical. It gives you the right level of abstraction at the bottom. Now, this is a big list: • Distributed version of

[DuckDB|Polars]

• Powered by Ray • Rich datatypes (geometries, graphs, series) • CPU × GPU • Iceberg + Delta Lake And that means that you need to be really smart about how you use your limited engineering resources. Most importantly, the window of opportunity is now. If you don't build this, someone else will for sure. Nature abhors a vacuum.

Open in Slack

Previous Next