MichaelV
08/20/2024, 7:06 PMDataArray of a Series entry. I am trying something very simple create a TensorArray with single tensor and print its values.jay
08/20/2024, 7:13 PMSeries object implements Debug which means that you can print it like so:
dbg!(my_series);
Or as part of a print statement:
println!("My Series: {:?}", my_series)
---
So our Series is “dynamically typed”, meaning that it holds a runtime instance of the DataType enum which tells you what kind of Array it contains.
In this case you should be able to see that the Series holds a DataType::Tensor(…). Using this information, you can then downcast the series into a TensorArray using series.downcast::<TensorArray>(). This is you essentially saying “I know that this is actually a TensorArray concrete type under the hood, please downcast it to that type”.
Now that you have a TensorArray, this is actually just a logical wrapper on top of a StructArray. You can grab the StructArray by accessing the .physical struct member of your TensorArray.
Now that you have the StructArray, you can play around with it as you wish. Here is the implementation of a `StructArray`: https://github.com/Eventual-Inc/Daft/blob/main/src/daft-core/src/array/struct_array.rs#L10-L16jay
08/20/2024, 7:14 PMSeries per nested child)MichaelV
08/20/2024, 8:06 PMListArray, I can see when debugging it contain a DataArray with correct values. I am trying to iterate over them and check which indices contains zeros, for the coo sparse tensor implementation.jay
08/20/2024, 10:10 PMimpl_daft_logical_data_array_datatype!(TensorType, Unknown, StructType);
One of those struct children is the data (tensor_array.data_array() -> ListArray) and the other is the shape data (tensor_array.shape_array() -> ListArray). This is because the Tensor type (unlike the FixedSizeTensor type) has a variable shape per-row.
I think to accomplish what you’re trying to do, you will have to iterate on both at the same time. Something like this:
let shape_iterator = tensor_array.shape_array().into_iter();
let data_iterator = tensor_array.data_array().into_iter();
let zipped_iterator = data_iterator.zip(shape_iterator);
// I think the syntax is slightly wrong here, but basically create a unit-length series with 1 element, zero
let zero_series = Int64Array::from(vec![0]).into_series();
// This iterates over each row as a tuple of (data, shape)
for (data_series, shape_row_series) in zipped_iterator {
// Shapes are u64
let shape_array = shape_series.u64();
// I think syntax is slightly wrong here as well... But basically this assertion should hold true at this point
assert!(data_series.len() == shape_series.iter().product(), "The length of the data is equal to product of all shape dims")
// Create a new boolean Series for 0 equality
let is_zero = data_series.eq(zero_series)
...
}jay
08/20/2024, 10:11 PM