and i notice that the format of `MicroPartition` i...
# daft-dev
c
and i notice that the format of
MicroPartition
is different from arrow. I want to know the consideration why we not use arrow as local format directly. Modin&ray-data are using arrow as the local format. @Colin Ho @jay thx
j
Yes! We do this because we have our own types and internal representations of data For example, we allow for columns of Python objects. Also, having our own implementations of nested types makes it much easier to work with.
This also lets us innovate without relying on arrow. For example doing umbra strings
c
understand. but we cannot use datafusion or other existing library for operators.
btw, what is
umbra strings
and while we store data in plasma store, can
Table
support in-place computation? @jay
j
Umbra strings: https://cedardb.com/blog/german_strings/
while we store data in plasma store, can
Table
support in-place computation?
We don’t do in-place computation, it’s really difficult to reason about when doing distributed data processing. Instead, we are moving to a streaming-based model for memory stability
we cannot use datafusion or other existing library for operators.
Not really, it’s pretty easy to export arrow-shaped data in a zero-copy way if we need to have other libraries work with our data. Our primitives are still arrow based!