For the read_parquet_into_pyarrow_bulk api, it see...
# general
t
For the read_parquet_into_pyarrow_bulk api, it seems that the smallest granularity is a page? Like when I read 10 row groups it might return 30 arrow arrays. How do I know which arrow arrays come from which row groups? Is there a way today? @Sammy Sidhu
s
We don’t provide a mechanism for that currently! If you have access to the parquet metadata, we can see what row ranges belong to what row group.
Happy to take a contribution if you’re interested!