Hello! Does Daft parquet file read support row num...
# daft-dev
m
Hello! Does Daft parquet file read support row number ? I need to query loads of parquet files with a where clause on the row number
j
Hey @Mehdi Zhiri! We don’t support an exact row number at the moment. Is this a global row number, or a local row number per-parquet file? If globa, how do consider ordering between parquet files in this workload?
m
Local to the per parquet file
j
I see. Is this something like “read this folder of parquet files, but only the first 10 rows of each file?” I don’t think we have such an API at the moment and it would have to be a pretty custom implementation I think!
Could you also maybe give some pseudo-code of what you’re attempting here? I might still be misunderstanding
m
No I have a bunch of filenames and row ids in these file names and I want to select all rows that match the two condition filename and row id
so like I need to do a
SELECT * FROM parquet_file WHERE row_id in row_ids
where
parquet_file
and
row_ids
are from an array like this:
[[file1, [1,4,6,7898]], [[file2, [4, 8, 33, 56]], ... ]