Hi I'm new both to rust and Daft, so pretty beginn...
# daft-dev
m
Hi I'm new both to rust and Daft, so pretty beginner question here. How can I access the inner
DataArray
of a
Series
entry. I am trying something very simple create a
TensorArray
with single tensor and print its values.
🙌 1
j
For printing, our
Series
object implements
Debug
which means that you can print it like so:
Copy code
dbg!(my_series);
Or as part of a print statement:
Copy code
println!("My Series: {:?}", my_series)
--- So our
Series
is “dynamically typed”, meaning that it holds a runtime instance of the
DataType
enum which tells you what kind of Array it contains. In this case you should be able to see that the
Series
holds a
DataType::Tensor(…)
. Using this information, you can then downcast the series into a
TensorArray
using
series.downcast::<TensorArray>()
. This is you essentially saying “I know that this is actually a TensorArray concrete type under the hood, please downcast it to that type”. Now that you have a
TensorArray
, this is actually just a logical wrapper on top of a
StructArray
. You can grab the
StructArray
by accessing the
.physical
struct member of your
TensorArray
. Now that you have the
StructArray
, you can play around with it as you wish. Here is the implementation of a `StructArray`: https://github.com/Eventual-Inc/Daft/blob/main/src/daft-core/src/array/struct_array.rs#L10-L16
(under the hood, a StructArray just holds one
Series
per nested child)
m
Thanks for the elaborate answer, I am trying to iterate over the data of the tensor which is of type
ListArray
, I can see when debugging it contain a
DataArray
with correct values. I am trying to iterate over them and check which indices contains zeros, for the coo sparse tensor implementation.
j
Yes, so it looks like the Tensor type is backed by a Structtype:
Copy code
impl_daft_logical_data_array_datatype!(TensorType, Unknown, StructType);
One of those struct children is the data (
tensor_array.data_array() -> ListArray
) and the other is the shape data (
tensor_array.shape_array() -> ListArray
). This is because the
Tensor
type (unlike the FixedSizeTensor type) has a variable shape per-row. I think to accomplish what you’re trying to do, you will have to iterate on both at the same time. Something like this:
Copy code
let shape_iterator = tensor_array.shape_array().into_iter();
let data_iterator = tensor_array.data_array().into_iter();
let zipped_iterator = data_iterator.zip(shape_iterator);

// I think the syntax is slightly wrong here, but basically create a unit-length series with 1 element, zero
let zero_series = Int64Array::from(vec![0]).into_series();

// This iterates over each row as a tuple of (data, shape)
for (data_series, shape_row_series) in zipped_iterator {
    // Shapes are u64
    let shape_array = shape_series.u64();

    // I think syntax is slightly wrong here as well... But basically this assertion should hold true at this point
    assert!(data_series.len() == shape_series.iter().product(), "The length of the data is equal to product of all shape dims")

    // Create a new boolean Series for 0 equality
    let is_zero = data_series.eq(zero_series)

    ...
}
LMK if this helps! I didn’t run this code through a compiler so it’s more like pseudo-code
❤️ 1